diff --git a/DESCRIPTION b/DESCRIPTION index 0c7ca306c..9472cac91 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -3,7 +3,7 @@ Title: Spatial Single-Cell Transcriptomics Toolbox Version: 4.0.6 Authors@R: c( person("Ruben", "Dries", email = "rubendries@gmail.com", - role = c("aut", "cre")), + role = c("aut", "cre"), comment = c(ORCID = "0000-0001-7650-7754")), person("Jiaji", "George Chen", email = "jiajic@bu.edu", role = c("aut")), person("Joselyn C.", "Chávez-Fuentes", email = "joselynchavezf@gmail.com", @@ -19,9 +19,8 @@ Authors@R: c( person("Natalie", "Del Rossi", email = "natalie.delrossi@mssm.edu", role = c("aut")) ) -Maintainer: Ruben Dries Description: Toolbox to process, analyze and visualize spatial single-cell expression data. -License: GPL (>= 3) +License: MIT + file LICENSE Encoding: UTF-8 LazyData: true URL: https://drieslab.github.io/Giotto/, https://github.com/drieslab/Giotto @@ -127,10 +126,6 @@ Suggests: trendsceek, testthat (>= 3.0.0), qs -Remotes: - drieslab/GiottoUtils, - drieslab/GiottoClass, - drieslab/GiottoVisuals Collate: 'auxiliary_giotto.R' 'cell_segmentation.R' @@ -164,6 +159,6 @@ Collate: 'variable_genes.R' 'wnn.R' 'zzz.R' -biocViews: +biocViews: Software, Technology, Spatial, SingleCell, Transcriptomics VignetteBuilder: knitr Config/testthat/edition: 3 diff --git a/LICENSE b/LICENSE new file mode 100644 index 000000000..c6016a215 --- /dev/null +++ b/LICENSE @@ -0,0 +1,2 @@ +YEAR: 2024 +COPYRIGHT HOLDER: Giotto authors diff --git a/LICENSE.md b/LICENSE.md index 175443ce8..2a7ea5db3 100644 --- a/LICENSE.md +++ b/LICENSE.md @@ -1,595 +1,21 @@ -GNU General Public License -========================== - -_Version 3, 29 June 2007_ -_Copyright © 2007 Free Software Foundation, Inc. <>_ - -Everyone is permitted to copy and distribute verbatim copies of this license -document, but changing it is not allowed. - -## Preamble - -The GNU General Public License is a free, copyleft license for software and other -kinds of works. - -The licenses for most software and other practical works are designed to take away -your freedom to share and change the works. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change all versions of a -program--to make sure it remains free software for all its users. We, the Free -Software Foundation, use the GNU General Public License for most of our software; it -applies also to any other work released this way by its authors. You can apply it to -your programs, too. - -When we speak of free software, we are referring to freedom, not price. Our General -Public Licenses are designed to make sure that you have the freedom to distribute -copies of free software (and charge for them if you wish), that you receive source -code or can get it if you want it, that you can change the software or use pieces of -it in new free programs, and that you know you can do these things. - -To protect your rights, we need to prevent others from denying you these rights or -asking you to surrender the rights. Therefore, you have certain responsibilities if -you distribute copies of the software, or if you modify it: responsibilities to -respect the freedom of others. - -For example, if you distribute copies of such a program, whether gratis or for a fee, -you must pass on to the recipients the same freedoms that you received. You must make -sure that they, too, receive or can get the source code. And you must show them these -terms so they know their rights. - -Developers that use the GNU GPL protect your rights with two steps: **(1)** assert -copyright on the software, and **(2)** offer you this License giving you legal permission -to copy, distribute and/or modify it. - -For the developers' and authors' protection, the GPL clearly explains that there is -no warranty for this free software. For both users' and authors' sake, the GPL -requires that modified versions be marked as changed, so that their problems will not -be attributed erroneously to authors of previous versions. - -Some devices are designed to deny users access to install or run modified versions of -the software inside them, although the manufacturer can do so. This is fundamentally -incompatible with the aim of protecting users' freedom to change the software. The -systematic pattern of such abuse occurs in the area of products for individuals to -use, which is precisely where it is most unacceptable. Therefore, we have designed -this version of the GPL to prohibit the practice for those products. If such problems -arise substantially in other domains, we stand ready to extend this provision to -those domains in future versions of the GPL, as needed to protect the freedom of -users. - -Finally, every program is threatened constantly by software patents. States should -not allow patents to restrict development and use of software on general-purpose -computers, but in those that do, we wish to avoid the special danger that patents -applied to a free program could make it effectively proprietary. To prevent this, the -GPL assures that patents cannot be used to render the program non-free. - -The precise terms and conditions for copying, distribution and modification follow. - -## TERMS AND CONDITIONS - -### 0. Definitions - -“This License” refers to version 3 of the GNU General Public License. - -“Copyright” also means copyright-like laws that apply to other kinds of -works, such as semiconductor masks. - -“The Program” refers to any copyrightable work licensed under this -License. Each licensee is addressed as “you”. “Licensees” and -“recipients” may be individuals or organizations. - -To “modify” a work means to copy from or adapt all or part of the work in -a fashion requiring copyright permission, other than the making of an exact copy. The -resulting work is called a “modified version” of the earlier work or a -work “based on” the earlier work. - -A “covered work” means either the unmodified Program or a work based on -the Program. - -To “propagate” a work means to do anything with it that, without -permission, would make you directly or secondarily liable for infringement under -applicable copyright law, except executing it on a computer or modifying a private -copy. Propagation includes copying, distribution (with or without modification), -making available to the public, and in some countries other activities as well. - -To “convey” a work means any kind of propagation that enables other -parties to make or receive copies. Mere interaction with a user through a computer -network, with no transfer of a copy, is not conveying. - -An interactive user interface displays “Appropriate Legal Notices” to the -extent that it includes a convenient and prominently visible feature that **(1)** -displays an appropriate copyright notice, and **(2)** tells the user that there is no -warranty for the work (except to the extent that warranties are provided), that -licensees may convey the work under this License, and how to view a copy of this -License. If the interface presents a list of user commands or options, such as a -menu, a prominent item in the list meets this criterion. - -### 1. Source Code - -The “source code” for a work means the preferred form of the work for -making modifications to it. “Object code” means any non-source form of a -work. - -A “Standard Interface” means an interface that either is an official -standard defined by a recognized standards body, or, in the case of interfaces -specified for a particular programming language, one that is widely used among -developers working in that language. - -The “System Libraries” of an executable work include anything, other than -the work as a whole, that **(a)** is included in the normal form of packaging a Major -Component, but which is not part of that Major Component, and **(b)** serves only to -enable use of the work with that Major Component, or to implement a Standard -Interface for which an implementation is available to the public in source code form. -A “Major Component”, in this context, means a major essential component -(kernel, window system, and so on) of the specific operating system (if any) on which -the executable work runs, or a compiler used to produce the work, or an object code -interpreter used to run it. - -The “Corresponding Source” for a work in object code form means all the -source code needed to generate, install, and (for an executable work) run the object -code and to modify the work, including scripts to control those activities. However, -it does not include the work's System Libraries, or general-purpose tools or -generally available free programs which are used unmodified in performing those -activities but which are not part of the work. For example, Corresponding Source -includes interface definition files associated with source files for the work, and -the source code for shared libraries and dynamically linked subprograms that the work -is specifically designed to require, such as by intimate data communication or -control flow between those subprograms and other parts of the work. - -The Corresponding Source need not include anything that users can regenerate -automatically from other parts of the Corresponding Source. - -The Corresponding Source for a work in source code form is that same work. - -### 2. Basic Permissions - -All rights granted under this License are granted for the term of copyright on the -Program, and are irrevocable provided the stated conditions are met. This License -explicitly affirms your unlimited permission to run the unmodified Program. The -output from running a covered work is covered by this License only if the output, -given its content, constitutes a covered work. This License acknowledges your rights -of fair use or other equivalent, as provided by copyright law. - -You may make, run and propagate covered works that you do not convey, without -conditions so long as your license otherwise remains in force. You may convey covered -works to others for the sole purpose of having them make modifications exclusively -for you, or provide you with facilities for running those works, provided that you -comply with the terms of this License in conveying all material for which you do not -control copyright. Those thus making or running the covered works for you must do so -exclusively on your behalf, under your direction and control, on terms that prohibit -them from making any copies of your copyrighted material outside their relationship -with you. - -Conveying under any other circumstances is permitted solely under the conditions -stated below. Sublicensing is not allowed; section 10 makes it unnecessary. - -### 3. Protecting Users' Legal Rights From Anti-Circumvention Law - -No covered work shall be deemed part of an effective technological measure under any -applicable law fulfilling obligations under article 11 of the WIPO copyright treaty -adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention -of such measures. - -When you convey a covered work, you waive any legal power to forbid circumvention of -technological measures to the extent such circumvention is effected by exercising -rights under this License with respect to the covered work, and you disclaim any -intention to limit operation or modification of the work as a means of enforcing, -against the work's users, your or third parties' legal rights to forbid circumvention -of technological measures. - -### 4. Conveying Verbatim Copies - -You may convey verbatim copies of the Program's source code as you receive it, in any -medium, provided that you conspicuously and appropriately publish on each copy an -appropriate copyright notice; keep intact all notices stating that this License and -any non-permissive terms added in accord with section 7 apply to the code; keep -intact all notices of the absence of any warranty; and give all recipients a copy of -this License along with the Program. - -You may charge any price or no price for each copy that you convey, and you may offer -support or warranty protection for a fee. - -### 5. Conveying Modified Source Versions - -You may convey a work based on the Program, or the modifications to produce it from -the Program, in the form of source code under the terms of section 4, provided that -you also meet all of these conditions: - -* **a)** The work must carry prominent notices stating that you modified it, and giving a -relevant date. -* **b)** The work must carry prominent notices stating that it is released under this -License and any conditions added under section 7. This requirement modifies the -requirement in section 4 to “keep intact all notices”. -* **c)** You must license the entire work, as a whole, under this License to anyone who -comes into possession of a copy. This License will therefore apply, along with any -applicable section 7 additional terms, to the whole of the work, and all its parts, -regardless of how they are packaged. This License gives no permission to license the -work in any other way, but it does not invalidate such permission if you have -separately received it. -* **d)** If the work has interactive user interfaces, each must display Appropriate Legal -Notices; however, if the Program has interactive interfaces that do not display -Appropriate Legal Notices, your work need not make them do so. - -A compilation of a covered work with other separate and independent works, which are -not by their nature extensions of the covered work, and which are not combined with -it such as to form a larger program, in or on a volume of a storage or distribution -medium, is called an “aggregate” if the compilation and its resulting -copyright are not used to limit the access or legal rights of the compilation's users -beyond what the individual works permit. Inclusion of a covered work in an aggregate -does not cause this License to apply to the other parts of the aggregate. - -### 6. Conveying Non-Source Forms - -You may convey a covered work in object code form under the terms of sections 4 and -5, provided that you also convey the machine-readable Corresponding Source under the -terms of this License, in one of these ways: - -* **a)** Convey the object code in, or embodied in, a physical product (including a -physical distribution medium), accompanied by the Corresponding Source fixed on a -durable physical medium customarily used for software interchange. -* **b)** Convey the object code in, or embodied in, a physical product (including a -physical distribution medium), accompanied by a written offer, valid for at least -three years and valid for as long as you offer spare parts or customer support for -that product model, to give anyone who possesses the object code either **(1)** a copy of -the Corresponding Source for all the software in the product that is covered by this -License, on a durable physical medium customarily used for software interchange, for -a price no more than your reasonable cost of physically performing this conveying of -source, or **(2)** access to copy the Corresponding Source from a network server at no -charge. -* **c)** Convey individual copies of the object code with a copy of the written offer to -provide the Corresponding Source. This alternative is allowed only occasionally and -noncommercially, and only if you received the object code with such an offer, in -accord with subsection 6b. -* **d)** Convey the object code by offering access from a designated place (gratis or for -a charge), and offer equivalent access to the Corresponding Source in the same way -through the same place at no further charge. You need not require recipients to copy -the Corresponding Source along with the object code. If the place to copy the object -code is a network server, the Corresponding Source may be on a different server -(operated by you or a third party) that supports equivalent copying facilities, -provided you maintain clear directions next to the object code saying where to find -the Corresponding Source. Regardless of what server hosts the Corresponding Source, -you remain obligated to ensure that it is available for as long as needed to satisfy -these requirements. -* **e)** Convey the object code using peer-to-peer transmission, provided you inform -other peers where the object code and Corresponding Source of the work are being -offered to the general public at no charge under subsection 6d. - -A separable portion of the object code, whose source code is excluded from the -Corresponding Source as a System Library, need not be included in conveying the -object code work. - -A “User Product” is either **(1)** a “consumer product”, which -means any tangible personal property which is normally used for personal, family, or -household purposes, or **(2)** anything designed or sold for incorporation into a -dwelling. In determining whether a product is a consumer product, doubtful cases -shall be resolved in favor of coverage. For a particular product received by a -particular user, “normally used” refers to a typical or common use of -that class of product, regardless of the status of the particular user or of the way -in which the particular user actually uses, or expects or is expected to use, the -product. A product is a consumer product regardless of whether the product has -substantial commercial, industrial or non-consumer uses, unless such uses represent -the only significant mode of use of the product. - -“Installation Information” for a User Product means any methods, -procedures, authorization keys, or other information required to install and execute -modified versions of a covered work in that User Product from a modified version of -its Corresponding Source. The information must suffice to ensure that the continued -functioning of the modified object code is in no case prevented or interfered with -solely because modification has been made. - -If you convey an object code work under this section in, or with, or specifically for -use in, a User Product, and the conveying occurs as part of a transaction in which -the right of possession and use of the User Product is transferred to the recipient -in perpetuity or for a fixed term (regardless of how the transaction is -characterized), the Corresponding Source conveyed under this section must be -accompanied by the Installation Information. But this requirement does not apply if -neither you nor any third party retains the ability to install modified object code -on the User Product (for example, the work has been installed in ROM). - -The requirement to provide Installation Information does not include a requirement to -continue to provide support service, warranty, or updates for a work that has been -modified or installed by the recipient, or for the User Product in which it has been -modified or installed. Access to a network may be denied when the modification itself -materially and adversely affects the operation of the network or violates the rules -and protocols for communication across the network. - -Corresponding Source conveyed, and Installation Information provided, in accord with -this section must be in a format that is publicly documented (and with an -implementation available to the public in source code form), and must require no -special password or key for unpacking, reading or copying. - -### 7. Additional Terms - -“Additional permissions” are terms that supplement the terms of this -License by making exceptions from one or more of its conditions. Additional -permissions that are applicable to the entire Program shall be treated as though they -were included in this License, to the extent that they are valid under applicable -law. If additional permissions apply only to part of the Program, that part may be -used separately under those permissions, but the entire Program remains governed by -this License without regard to the additional permissions. - -When you convey a copy of a covered work, you may at your option remove any -additional permissions from that copy, or from any part of it. (Additional -permissions may be written to require their own removal in certain cases when you -modify the work.) You may place additional permissions on material, added by you to a -covered work, for which you have or can give appropriate copyright permission. - -Notwithstanding any other provision of this License, for material you add to a -covered work, you may (if authorized by the copyright holders of that material) -supplement the terms of this License with terms: - -* **a)** Disclaiming warranty or limiting liability differently from the terms of -sections 15 and 16 of this License; or -* **b)** Requiring preservation of specified reasonable legal notices or author -attributions in that material or in the Appropriate Legal Notices displayed by works -containing it; or -* **c)** Prohibiting misrepresentation of the origin of that material, or requiring that -modified versions of such material be marked in reasonable ways as different from the -original version; or -* **d)** Limiting the use for publicity purposes of names of licensors or authors of the -material; or -* **e)** Declining to grant rights under trademark law for use of some trade names, -trademarks, or service marks; or -* **f)** Requiring indemnification of licensors and authors of that material by anyone -who conveys the material (or modified versions of it) with contractual assumptions of -liability to the recipient, for any liability that these contractual assumptions -directly impose on those licensors and authors. - -All other non-permissive additional terms are considered “further -restrictions” within the meaning of section 10. If the Program as you received -it, or any part of it, contains a notice stating that it is governed by this License -along with a term that is a further restriction, you may remove that term. If a -license document contains a further restriction but permits relicensing or conveying -under this License, you may add to a covered work material governed by the terms of -that license document, provided that the further restriction does not survive such -relicensing or conveying. - -If you add terms to a covered work in accord with this section, you must place, in -the relevant source files, a statement of the additional terms that apply to those -files, or a notice indicating where to find the applicable terms. - -Additional terms, permissive or non-permissive, may be stated in the form of a -separately written license, or stated as exceptions; the above requirements apply -either way. - -### 8. Termination - -You may not propagate or modify a covered work except as expressly provided under -this License. Any attempt otherwise to propagate or modify it is void, and will -automatically terminate your rights under this License (including any patent licenses -granted under the third paragraph of section 11). - -However, if you cease all violation of this License, then your license from a -particular copyright holder is reinstated **(a)** provisionally, unless and until the -copyright holder explicitly and finally terminates your license, and **(b)** permanently, -if the copyright holder fails to notify you of the violation by some reasonable means -prior to 60 days after the cessation. - -Moreover, your license from a particular copyright holder is reinstated permanently -if the copyright holder notifies you of the violation by some reasonable means, this -is the first time you have received notice of violation of this License (for any -work) from that copyright holder, and you cure the violation prior to 30 days after -your receipt of the notice. - -Termination of your rights under this section does not terminate the licenses of -parties who have received copies or rights from you under this License. If your -rights have been terminated and not permanently reinstated, you do not qualify to -receive new licenses for the same material under section 10. - -### 9. Acceptance Not Required for Having Copies - -You are not required to accept this License in order to receive or run a copy of the -Program. Ancillary propagation of a covered work occurring solely as a consequence of -using peer-to-peer transmission to receive a copy likewise does not require -acceptance. However, nothing other than this License grants you permission to -propagate or modify any covered work. These actions infringe copyright if you do not -accept this License. Therefore, by modifying or propagating a covered work, you -indicate your acceptance of this License to do so. - -### 10. Automatic Licensing of Downstream Recipients - -Each time you convey a covered work, the recipient automatically receives a license -from the original licensors, to run, modify and propagate that work, subject to this -License. You are not responsible for enforcing compliance by third parties with this -License. - -An “entity transaction” is a transaction transferring control of an -organization, or substantially all assets of one, or subdividing an organization, or -merging organizations. If propagation of a covered work results from an entity -transaction, each party to that transaction who receives a copy of the work also -receives whatever licenses to the work the party's predecessor in interest had or -could give under the previous paragraph, plus a right to possession of the -Corresponding Source of the work from the predecessor in interest, if the predecessor -has it or can get it with reasonable efforts. - -You may not impose any further restrictions on the exercise of the rights granted or -affirmed under this License. For example, you may not impose a license fee, royalty, -or other charge for exercise of rights granted under this License, and you may not -initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging -that any patent claim is infringed by making, using, selling, offering for sale, or -importing the Program or any portion of it. - -### 11. Patents - -A “contributor” is a copyright holder who authorizes use under this -License of the Program or a work on which the Program is based. The work thus -licensed is called the contributor's “contributor version”. - -A contributor's “essential patent claims” are all patent claims owned or -controlled by the contributor, whether already acquired or hereafter acquired, that -would be infringed by some manner, permitted by this License, of making, using, or -selling its contributor version, but do not include claims that would be infringed -only as a consequence of further modification of the contributor version. For -purposes of this definition, “control” includes the right to grant patent -sublicenses in a manner consistent with the requirements of this License. - -Each contributor grants you a non-exclusive, worldwide, royalty-free patent license -under the contributor's essential patent claims, to make, use, sell, offer for sale, -import and otherwise run, modify and propagate the contents of its contributor -version. - -In the following three paragraphs, a “patent license” is any express -agreement or commitment, however denominated, not to enforce a patent (such as an -express permission to practice a patent or covenant not to sue for patent -infringement). To “grant” such a patent license to a party means to make -such an agreement or commitment not to enforce a patent against the party. - -If you convey a covered work, knowingly relying on a patent license, and the -Corresponding Source of the work is not available for anyone to copy, free of charge -and under the terms of this License, through a publicly available network server or -other readily accessible means, then you must either **(1)** cause the Corresponding -Source to be so available, or **(2)** arrange to deprive yourself of the benefit of the -patent license for this particular work, or **(3)** arrange, in a manner consistent with -the requirements of this License, to extend the patent license to downstream -recipients. “Knowingly relying” means you have actual knowledge that, but -for the patent license, your conveying the covered work in a country, or your -recipient's use of the covered work in a country, would infringe one or more -identifiable patents in that country that you have reason to believe are valid. - -If, pursuant to or in connection with a single transaction or arrangement, you -convey, or propagate by procuring conveyance of, a covered work, and grant a patent -license to some of the parties receiving the covered work authorizing them to use, -propagate, modify or convey a specific copy of the covered work, then the patent -license you grant is automatically extended to all recipients of the covered work and -works based on it. - -A patent license is “discriminatory” if it does not include within the -scope of its coverage, prohibits the exercise of, or is conditioned on the -non-exercise of one or more of the rights that are specifically granted under this -License. You may not convey a covered work if you are a party to an arrangement with -a third party that is in the business of distributing software, under which you make -payment to the third party based on the extent of your activity of conveying the -work, and under which the third party grants, to any of the parties who would receive -the covered work from you, a discriminatory patent license **(a)** in connection with -copies of the covered work conveyed by you (or copies made from those copies), or **(b)** -primarily for and in connection with specific products or compilations that contain -the covered work, unless you entered into that arrangement, or that patent license -was granted, prior to 28 March 2007. - -Nothing in this License shall be construed as excluding or limiting any implied -license or other defenses to infringement that may otherwise be available to you -under applicable patent law. - -### 12. No Surrender of Others' Freedom - -If conditions are imposed on you (whether by court order, agreement or otherwise) -that contradict the conditions of this License, they do not excuse you from the -conditions of this License. If you cannot convey a covered work so as to satisfy -simultaneously your obligations under this License and any other pertinent -obligations, then as a consequence you may not convey it at all. For example, if you -agree to terms that obligate you to collect a royalty for further conveying from -those to whom you convey the Program, the only way you could satisfy both those terms -and this License would be to refrain entirely from conveying the Program. - -### 13. Use with the GNU Affero General Public License - -Notwithstanding any other provision of this License, you have permission to link or -combine any covered work with a work licensed under version 3 of the GNU Affero -General Public License into a single combined work, and to convey the resulting work. -The terms of this License will continue to apply to the part which is the covered -work, but the special requirements of the GNU Affero General Public License, section -13, concerning interaction through a network will apply to the combination as such. - -### 14. Revised Versions of this License - -The Free Software Foundation may publish revised and/or new versions of the GNU -General Public License from time to time. Such new versions will be similar in spirit -to the present version, but may differ in detail to address new problems or concerns. - -Each version is given a distinguishing version number. If the Program specifies that -a certain numbered version of the GNU General Public License “or any later -version” applies to it, you have the option of following the terms and -conditions either of that numbered version or of any later version published by the -Free Software Foundation. If the Program does not specify a version number of the GNU -General Public License, you may choose any version ever published by the Free -Software Foundation. - -If the Program specifies that a proxy can decide which future versions of the GNU -General Public License can be used, that proxy's public statement of acceptance of a -version permanently authorizes you to choose that version for the Program. - -Later license versions may give you additional or different permissions. However, no -additional obligations are imposed on any author or copyright holder as a result of -your choosing to follow a later version. - -### 15. Disclaimer of Warranty - -THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. -EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER -EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE -QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE -DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - -### 16. Limitation of Liability - -IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY -COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS -PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, -INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE -PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE -OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE -WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - -### 17. Interpretation of Sections 15 and 16 - -If the disclaimer of warranty and limitation of liability provided above cannot be -given local legal effect according to their terms, reviewing courts shall apply local -law that most closely approximates an absolute waiver of all civil liability in -connection with the Program, unless a warranty or assumption of liability accompanies -a copy of the Program in return for a fee. - -_END OF TERMS AND CONDITIONS_ - -## How to Apply These Terms to Your New Programs - -If you develop a new program, and you want it to be of the greatest possible use to -the public, the best way to achieve this is to make it free software which everyone -can redistribute and change under these terms. - -To do so, attach the following notices to the program. It is safest to attach them -to the start of each source file to most effectively state the exclusion of warranty; -and each file should have at least the “copyright” line and a pointer to -where the full notice is found. - - - Copyright (C) - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . - -Also add information on how to contact you by electronic and paper mail. - -If the program does terminal interaction, make it output a short notice like this -when it starts in an interactive mode: - - Copyright (C) - This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type 'show c' for details. - -The hypothetical commands `show w` and `show c` should show the appropriate parts of -the General Public License. Of course, your program's commands might be different; -for a GUI interface, you would use an “about box”. - -You should also get your employer (if you work as a programmer) or school, if any, to -sign a “copyright disclaimer” for the program, if necessary. For more -information on this, and how to apply and follow the GNU GPL, see -<>. - -The GNU General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may consider it -more useful to permit linking proprietary applications with the library. If this is -what you want to do, use the GNU Lesser General Public License instead of this -License. But first, please read -<>. +# MIT License + +Copyright (c) 2024 Giotto authors + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/R/auxiliary_giotto.R b/R/auxiliary_giotto.R index b09798683..430f48979 100644 --- a/R/auxiliary_giotto.R +++ b/R/auxiliary_giotto.R @@ -1,5 +1,3 @@ - - ## Giotto auxiliary functions #### @@ -11,48 +9,49 @@ #' @param mymatrix matrix of expression info #' @param detection_threshold detection threshold. Defaults to 1 count. #' @keywords internal -.mean_expr_det_test = function(mymatrix, detection_threshold = 1) { - unlist(apply(X = mymatrix, MARGIN = 1, FUN = function(x) { - detected_x = x[x > detection_threshold] - mean(detected_x) - })) +.mean_expr_det_test <- function(mymatrix, detection_threshold = 1) { + unlist(apply(X = mymatrix, MARGIN = 1, FUN = function(x) { + detected_x <- x[x > detection_threshold] + mean(detected_x) + })) } #' @title Normalize expression matrix for library size #' @param mymatrix matrix object #' @param scalefactor scalefactor #' @keywords internal -.lib_norm_giotto = function(mymatrix, scalefactor){ - - libsizes = colSums_flex(mymatrix) - - if(any(libsizes == 0)) { - warning(wrap_txt('Total library size or counts for individual spat units are 0. - This will likely result in normalization problems. - filter (filterGiotto) or impute (imputeGiotto) spatial units.')) - } +.lib_norm_giotto <- function(mymatrix, scalefactor) { + libsizes <- colSums_flex(mymatrix) + + if (any(libsizes == 0)) { + warning(wrap_txt("Total library size or counts for individual spat + units are 0. + This will likely result in normalization problems. + filter (filterGiotto) or impute (imputeGiotto) spatial + units.")) + } - norm_expr = t_flex(t_flex(mymatrix)/ libsizes)*scalefactor - return(norm_expr) + norm_expr <- t_flex(t_flex(mymatrix) / libsizes) * scalefactor + return(norm_expr) } #' @title Log normalize expression matrix #' @keywords internal -.log_norm_giotto = function(mymatrix, base, offset) { - - if(methods::is(mymatrix, 'DelayedArray')) { - mymatrix = log(mymatrix + offset)/log(base) - # } else if(methods::is(mymatrix, 'DelayedMatrix')) { - # mymatrix = log(mymatrix + offset)/log(base) - } else if(methods::is(mymatrix, 'dgCMatrix')) { - mymatrix@x = log(mymatrix@x + offset)/log(base) # replace with sparseMatrixStats - } else if(methods::is(mymatrix, 'Matrix')) { - mymatrix@x = log(mymatrix@x + offset)/log(base) - } else { - mymatrix = log(as.matrix(mymatrix) + offset)/log(base) - } - - return(mymatrix) +.log_norm_giotto <- function(mymatrix, base, offset) { + if (methods::is(mymatrix, "DelayedArray")) { + mymatrix <- log(mymatrix + offset) / log(base) + # } else if(methods::is(mymatrix, 'DelayedMatrix')) { + # mymatrix = log(mymatrix + offset)/log(base) + } else if (methods::is(mymatrix, "dgCMatrix")) { + mymatrix@x <- log(mymatrix@x + offset) / log(base) + # replace with sparseMatrixStats + } else if (methods::is(mymatrix, "Matrix")) { + mymatrix@x <- log(mymatrix@x + offset) / log(base) + } else { + mymatrix <- log(as.matrix(mymatrix) + offset) / log(base) + } + + return(mymatrix) } @@ -73,7 +72,8 @@ #' @title filterDistributions #' @name filterDistributions -#' @description show gene or cell distribution after filtering on expression threshold +#' @description show gene or cell distribution after filtering on expression +#' threshold #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -86,14 +86,18 @@ #' @param nr_bins number of bins for histogram plot #' @param fill_color fill color for plots #' @param scale_axis ggplot transformation for axis (e.g. log2) -#' @param axis_offset offset to be used together with the scaling transformation +#' @param axis_offset offset to be used together with the scaling +#' transformation #' @param show_plot logical. show plot #' @param return_plot logical. return ggplot object #' @param save_plot logical. directly save the plot -#' @param save_param list of saving parameters from [GiottoVisuals::all_plots_save_function] -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters from +#' [GiottoVisuals::all_plots_save_function] +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @details -#' There are 3 ways to create a distribution profile and summarize it for either the features or the cells (spatial units) \cr +#' There are 3 ways to create a distribution profile and summarize it for +#' either the features or the cells (spatial units) \cr #' \itemize{ #' \item{1. threshold: calculate features that cross a thresold (default)} #' \item{2. sum: summarize the features, i.e. total of a feature} @@ -103,290 +107,332 @@ #' @md #' @export filterDistributions <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('raw', 'normalized', 'scaled', 'custom'), - method = c('threshold', 'sum', 'mean'), - expression_threshold = 1, - detection = c('feats', 'cells'), - plot_type = c('histogram', 'violin'), - scale_y = NULL, - nr_bins = 30, - fill_color = 'lightblue', - scale_axis = 'identity', - axis_offset = 0, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'filterDistributions') { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - values = match.arg(expression_values, unique(c('raw', 'normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - # plot distribution for feats or cells - detection = match.arg(detection, c('feats', 'cells')) - - # method to calculate distribution - method = match.arg(method, c('threshold', 'sum', 'mean')) - - # plot type - plot_type = match.arg(plot_type, c('histogram', 'violin')) - - # variables - V1 = NULL - - # for genes - if(detection == 'feats') { - - if(method == 'threshold') { - feat_detection_levels = data.table::as.data.table(rowSums_flex(expr_values >= expression_threshold)) - mytitle = 'feat detected in # of cells' - } else if(method == 'sum') { - feat_detection_levels = data.table::as.data.table(rowSums_flex(expr_values)) - mytitle = 'total sum of feature detected in all cells' - } else if(method == 'mean') { - feat_detection_levels = data.table::as.data.table(rowMeans_flex(expr_values)) - mytitle = 'average of feature detected in all cells' - } - - y_title = 'count' - if(!is.null(scale_y)) { - feat_detection_levels[, V1 := do.call(what = scale_y, list(V1))] - y_title = paste(scale_y, y_title) - } - - - - if(plot_type == 'violin') { - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_violin(data = feat_detection_levels, ggplot2::aes(x = 'feats', y = V1+axis_offset), - fill = fill_color) - pl <- pl + ggplot2::scale_y_continuous(trans = scale_axis) - pl <- pl + ggplot2::labs(y = mytitle, x = '') - - } else if(plot_type == 'histogram') { - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_histogram(data = feat_detection_levels, ggplot2::aes(x = V1+axis_offset), - color = 'white', bins = nr_bins, fill = fill_color) - pl <- pl + ggplot2::scale_x_continuous(trans = scale_axis) - pl <- pl + ggplot2::labs(x = mytitle, y = y_title) - - } - - # for cells - } else if(detection == 'cells') { - + feat_type = NULL, + spat_unit = NULL, + expression_values = c("raw", "normalized", "scaled", "custom"), + method = c("threshold", "sum", "mean"), + expression_threshold = 1, + detection = c("feats", "cells"), + plot_type = c("histogram", "violin"), + scale_y = NULL, + nr_bins = 30, + fill_color = "lightblue", + scale_axis = "identity", + axis_offset = 0, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "filterDistributions") { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - if(method == 'threshold') { - cell_detection_levels = data.table::as.data.table(colSums_flex(expr_values >= expression_threshold)) - mytitle = 'feats detected per cell' - } else if(method == 'sum') { - cell_detection_levels = data.table::as.data.table(colSums_flex(expr_values)) - mytitle = 'total features per cell' - } else if(method == 'mean') { - cell_detection_levels = data.table::as.data.table(colMeans_flex(expr_values)) - mytitle = 'average number of features per cell' - } + # expression values to be used + values <- match.arg( + expression_values, + unique(c("raw", "normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) - y_title = 'count' - if(!is.null(scale_y)) { - cell_detection_levels[, V1 := do.call(what = scale_y, list(V1))] - y_title = paste(scale_y, y_title) + # plot distribution for feats or cells + detection <- match.arg(detection, c("feats", "cells")) + + # method to calculate distribution + method <- match.arg(method, c("threshold", "sum", "mean")) + + # plot type + plot_type <- match.arg(plot_type, c("histogram", "violin")) + + # variables + V1 <- NULL + + # for genes + if (detection == "feats") { + if (method == "threshold") { + feat_detection_levels <- data.table::as.data.table( + rowSums_flex(expr_values >= expression_threshold)) + mytitle <- "feat detected in # of cells" + } else if (method == "sum") { + feat_detection_levels <- data.table::as.data.table( + rowSums_flex(expr_values)) + mytitle <- "total sum of feature detected in all cells" + } else if (method == "mean") { + feat_detection_levels <- data.table::as.data.table( + rowMeans_flex(expr_values)) + mytitle <- "average of feature detected in all cells" + } + + y_title <- "count" + if (!is.null(scale_y)) { + feat_detection_levels[, V1 := do.call(what = scale_y, list(V1))] + y_title <- paste(scale_y, y_title) + } + + + + if (plot_type == "violin") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_violin( + data = feat_detection_levels, + ggplot2::aes(x = "feats", y = V1 + axis_offset), + fill = fill_color + ) + pl <- pl + ggplot2::scale_y_continuous(trans = scale_axis) + pl <- pl + ggplot2::labs(y = mytitle, x = "") + } else if (plot_type == "histogram") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_histogram( + data = feat_detection_levels, + ggplot2::aes(x = V1 + axis_offset), + color = "white", bins = nr_bins, fill = fill_color + ) + pl <- pl + ggplot2::scale_x_continuous(trans = scale_axis) + pl <- pl + ggplot2::labs(x = mytitle, y = y_title) + } + + # for cells + } else if (detection == "cells") { + if (method == "threshold") { + cell_detection_levels <- data.table::as.data.table( + colSums_flex(expr_values >= expression_threshold)) + mytitle <- "feats detected per cell" + } else if (method == "sum") { + cell_detection_levels <- data.table::as.data.table( + colSums_flex(expr_values)) + mytitle <- "total features per cell" + } else if (method == "mean") { + cell_detection_levels <- data.table::as.data.table( + colMeans_flex(expr_values)) + mytitle <- "average number of features per cell" + } + + y_title <- "count" + if (!is.null(scale_y)) { + cell_detection_levels[, V1 := do.call(what = scale_y, list(V1))] + y_title <- paste(scale_y, y_title) + } + + + + if (plot_type == "violin") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_violin( + data = cell_detection_levels, + ggplot2::aes(x = "cells", y = V1 + axis_offset), + fill = fill_color + ) + pl <- pl + ggplot2::scale_y_continuous(trans = scale_axis) + pl <- pl + ggplot2::labs(y = mytitle, x = "") + } else if (plot_type == "histogram") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_histogram( + data = cell_detection_levels, + ggplot2::aes(x = V1 + axis_offset), + color = "white", bins = nr_bins, fill = fill_color + ) + pl <- pl + ggplot2::scale_x_continuous(trans = scale_axis) + pl <- pl + ggplot2::labs(x = mytitle, y = y_title) + } } - - - if(plot_type == 'violin') { - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_violin(data = cell_detection_levels, ggplot2::aes(x = 'cells', y = V1+axis_offset), - fill = fill_color) - pl <- pl + ggplot2::scale_y_continuous(trans = scale_axis) - pl <- pl + ggplot2::labs(y = mytitle, x = '') - - } else if(plot_type == 'histogram') { - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_histogram(data = cell_detection_levels, ggplot2::aes(x = V1+axis_offset), - color = 'white', bins = nr_bins, fill = fill_color) - pl <- pl + ggplot2::scale_x_continuous(trans = scale_axis) - pl <- pl + ggplot2::labs(x = mytitle, y = y_title) - - } - } - - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title filterCombinations #' @name filterCombinations -#' @description Shows how many genes and cells are lost with combinations of thresholds. +#' @description Shows how many genes and cells are lost with combinations of +#' thresholds. #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @param expression_values expression values to use #' @param expression_thresholds all thresholds to consider a gene expressed -#' @param feat_det_in_min_cells minimum # of cells that need to express a feature -#' @param min_det_feats_per_cell minimum # of features that need to be detected in a cell +#' @param feat_det_in_min_cells minimum # of cells that need to express a +#' feature +#' @param min_det_feats_per_cell minimum # of features that need to be +#' detected in a cell #' @param scale_x_axis ggplot transformation for x-axis (e.g. log2) -#' @param x_axis_offset x-axis offset to be used together with the scaling transformation +#' @param x_axis_offset x-axis offset to be used together with the scaling +#' transformation #' @param scale_y_axis ggplot transformation for y-axis (e.g. log2) -#' @param y_axis_offset y-axis offset to be used together with the scaling transformation +#' @param y_axis_offset y-axis offset to be used together with the scaling +#' transformation #' @return list of data.table and ggplot object -#' @details Creates a scatterplot that visualizes the number of genes and cells that are -#' lost with a specific combination of a gene and cell threshold given an arbitrary cutoff -#' to call a gene expressed. This function can be used to make an informed decision at the -#' filtering step with filterGiotto. +#' @details Creates a scatterplot that visualizes the number of genes and +#' cells that are lost with a specific combination of a gene and cell +#' threshold given an arbitrary cutoff to call a gene expressed. This function +#' can be used to make an informed decision at the filtering step with +#' filterGiotto. #' @export filterCombinations <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('raw', 'normalized', 'scaled', 'custom'), - expression_thresholds = c(1, 2), - feat_det_in_min_cells = c(5, 50), - min_det_feats_per_cell = c(200, 400), - scale_x_axis = 'identity', - x_axis_offset = 0, - scale_y_axis = 'identity', - y_axis_offset = 0, - show_plot = TRUE, - return_plot = FALSE, - save_plot = NULL, - save_param = list(), - default_save_name = 'filterCombinations') { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - # expression values to be used - values = match.arg(expression_values, unique(c('raw', 'normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values)[] - - # feat and cell minimums need to have the same length - if(length(feat_det_in_min_cells) != length(min_det_feats_per_cell)) { - stop('\n feat_det_in_min_cells and min_det_feats_per_cell need to be the same size \n') - } - - # compute the number of removed feats and cells - result_list = list() - for(thresh_i in seq_along(expression_thresholds)) { - - threshold = expression_thresholds[thresh_i] - - det_feats_res = list() - det_cells_res = list() - for(combn_i in seq_along(feat_det_in_min_cells)) { - - min_cells_for_feat = feat_det_in_min_cells[combn_i] - min_feats_per_cell = min_det_feats_per_cell[combn_i] - - - # first remove feats - filter_index_feats = rowSums_flex(expr_values >= threshold) >= min_cells_for_feat - removed_feats = length(filter_index_feats[filter_index_feats == FALSE]) - det_cells_res[[combn_i]] = removed_feats - - # then remove cells - filter_index_cells = colSums_flex(expr_values[filter_index_feats, ] >= threshold) >= min_feats_per_cell - removed_cells = length(filter_index_cells[filter_index_cells == FALSE]) - det_feats_res[[combn_i]] = removed_cells + feat_type = NULL, + spat_unit = NULL, + expression_values = c("raw", "normalized", "scaled", "custom"), + expression_thresholds = c(1, 2), + feat_det_in_min_cells = c(5, 50), + min_det_feats_per_cell = c(200, 400), + scale_x_axis = "identity", + x_axis_offset = 0, + scale_y_axis = "identity", + y_axis_offset = 0, + show_plot = TRUE, + return_plot = FALSE, + save_plot = NULL, + save_param = list(), + default_save_name = "filterCombinations") { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("raw", "normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values + )[] + + # feat and cell minimums need to have the same length + if (length(feat_det_in_min_cells) != length(min_det_feats_per_cell)) { + stop("\n feat_det_in_min_cells and min_det_feats_per_cell need to be + the same size \n") + } + + # compute the number of removed feats and cells + result_list <- list() + for (thresh_i in seq_along(expression_thresholds)) { + threshold <- expression_thresholds[thresh_i] + + det_feats_res <- list() + det_cells_res <- list() + for (combn_i in seq_along(feat_det_in_min_cells)) { + min_cells_for_feat <- feat_det_in_min_cells[combn_i] + min_feats_per_cell <- min_det_feats_per_cell[combn_i] + + + # first remove feats + filter_index_feats <- rowSums_flex( + expr_values >= threshold) >= min_cells_for_feat + removed_feats <- length(filter_index_feats[ + filter_index_feats == FALSE]) + det_cells_res[[combn_i]] <- removed_feats + + # then remove cells + filter_index_cells <- colSums_flex(expr_values[ + filter_index_feats, ] >= threshold) >= min_feats_per_cell + removed_cells <- length(filter_index_cells[ + filter_index_cells == FALSE]) + det_feats_res[[combn_i]] <- removed_cells + } + + temp_dt <- data.table::data.table( + "threshold" = threshold, + removed_feats = unlist(det_cells_res), + removed_cells = unlist(det_feats_res) + ) + + result_list[[thresh_i]] <- temp_dt } - temp_dt = data.table::data.table('threshold' = threshold, - removed_feats = unlist(det_cells_res), - removed_cells = unlist(det_feats_res)) - - result_list[[thresh_i]] = temp_dt - - } - - result_DT = do.call('rbind', result_list) - - # data.table variables - # feat_detected_in_min_cells = min_detected_feats_per_cell = combination = NULL - - # data.table variables - feat_detected_in_min_cells = min_detected_feats_per_cell = combination = NULL - - result_DT[['feat_detected_in_min_cells']] = feat_det_in_min_cells - result_DT[['min_detected_feats_per_cell']] = min_det_feats_per_cell - result_DT[['combination']] = paste0(result_DT$feat_detected_in_min_cells,'-',result_DT$min_detected_feats_per_cell) - - result_DT = result_DT[,.(threshold, - feat_detected_in_min_cells, - min_detected_feats_per_cell, - combination, - removed_feats, - removed_cells)] - - maximum_x_value = max(result_DT[['removed_cells']], na.rm = T) - maximum_y_value = max(result_DT[['removed_feats']], na.rm = T) - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_line(data = result_DT, aes(x = removed_cells+x_axis_offset, - y = removed_feats+y_axis_offset, - group = as.factor(threshold)), linetype = 2) - pl <- pl + ggplot2::geom_point(data = result_DT, aes(x = removed_cells+x_axis_offset, - y = removed_feats+y_axis_offset, - color = as.factor(threshold))) - pl <- pl + scale_color_discrete(guide = guide_legend(title = 'threshold(s)')) - pl <- pl + ggrepel::geom_text_repel(data = result_DT, aes(x = removed_cells+x_axis_offset, - y = removed_feats+y_axis_offset, - label = combination)) - pl <- pl + ggplot2::scale_x_continuous(trans = scale_x_axis, limits = c(0, maximum_x_value)) - pl <- pl + ggplot2::scale_y_continuous(trans = scale_y_axis, limits = c(0, maximum_y_value)) - pl <- pl + ggplot2::labs(x = 'number of removed cells', y = 'number of removed feats') - - - return(plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = list(results = result_DT, ggplot = pl) - )) + result_DT <- do.call("rbind", result_list) + + # data.table variables + feat_detected_in_min_cells <- min_detected_feats_per_cell <- + combination <- NULL + + result_DT[["feat_detected_in_min_cells"]] <- feat_det_in_min_cells + result_DT[["min_detected_feats_per_cell"]] <- min_det_feats_per_cell + result_DT[["combination"]] <- paste0( + result_DT$feat_detected_in_min_cells, "-", + result_DT$min_detected_feats_per_cell) + + result_DT <- result_DT[, .( + threshold, + feat_detected_in_min_cells, + min_detected_feats_per_cell, + combination, + removed_feats, + removed_cells + )] + + maximum_x_value <- max(result_DT[["removed_cells"]], na.rm = TRUE) + maximum_y_value <- max(result_DT[["removed_feats"]], na.rm = TRUE) + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_line(data = result_DT, aes( + x = removed_cells + x_axis_offset, + y = removed_feats + y_axis_offset, + group = as.factor(threshold) + ), linetype = 2) + pl <- pl + ggplot2::geom_point(data = result_DT, aes( + x = removed_cells + x_axis_offset, + y = removed_feats + y_axis_offset, + color = as.factor(threshold) + )) + pl <- pl + scale_color_discrete( + guide = guide_legend(title = "threshold(s)")) + pl <- pl + ggrepel::geom_text_repel(data = result_DT, aes( + x = removed_cells + x_axis_offset, + y = removed_feats + y_axis_offset, + label = combination + )) + pl <- pl + ggplot2::scale_x_continuous( + trans = scale_x_axis, limits = c(0, maximum_x_value)) + pl <- pl + ggplot2::scale_y_continuous( + trans = scale_y_axis, limits = c(0, maximum_y_value)) + pl <- pl + ggplot2::labs( + x = "number of removed cells", y = "number of removed feats") + + + return(plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = list(results = result_DT, ggplot = pl) + )) } @@ -402,8 +448,10 @@ filterCombinations <- function(gobject, #' other feat_types provided. #' @param expression_values expression values to use #' @param expression_threshold threshold to consider a gene expressed -#' @param feat_det_in_min_cells minimum # of cells that need to express a feature -#' @param min_det_feats_per_cell minimum # of features that need to be detected in a cell +#' @param feat_det_in_min_cells minimum # of cells that need to express a +#' feature +#' @param min_det_feats_per_cell minimum # of features that need to be detected +#' in a cell #' @param all_spat_units deprecated. Use spat_unit_fsub = ":all:" #' @param all_feat_types deprecated. Use feat_type_ssub = ":all:" #' @param spat_unit_fsub character vector. (default = ':all:') limit features @@ -418,193 +466,223 @@ filterCombinations <- function(gobject, #' @param verbose verbose #' #' @return giotto object -#' @details The function \code{\link{filterCombinations}} can be used to explore the effect of different parameter values. -#' Please note that this function filters data in a predefined order, features, then cells. -#' After filtering in this order, certain features may be left over in the metadata with a -#' corresponding number of cells which is less than that of the threshold value of cells, +#' @details The function \code{\link{filterCombinations}} can be used to +#' explore the effect of different parameter values. +#' Please note that this function filters data in a predefined order, features, +#' then cells. +#' After filtering in this order, certain features may be left over in the +#' metadata with a corresponding number of cells which is less than that of +#' the threshold value of cells, #' feat_det_in_min_cells. This behavior is explained in detail here: #' \url{https://github.com/drieslab/Giotto/issues/500#issuecomment-1396083446} #' @export -filterGiotto = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('raw', 'normalized', 'scaled', 'custom'), - expression_threshold = 1, - feat_det_in_min_cells = 100, - min_det_feats_per_cell = 100, - spat_unit_fsub = ":all:", - feat_type_ssub = ":all:", - all_spat_units = NULL, - all_feat_types = NULL, - poly_info = NULL, - tag_cells = FALSE, - tag_cell_name = 'tag', - tag_feats = FALSE, - tag_feats_name = 'tag', - verbose = TRUE) { - - # data.table vars - cell_ID = feat_ID = NULL - - # handle deprecations - if (!is.null(all_spat_units)) { - if (all_spat_units) spat_unit_fsub = ":all:" - else spat_unit_fsub = spat_unit - - warning(wrap_txt( - 'filterGiotto: +filterGiotto <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("raw", "normalized", "scaled", "custom"), + expression_threshold = 1, + feat_det_in_min_cells = 100, + min_det_feats_per_cell = 100, + spat_unit_fsub = ":all:", + feat_type_ssub = ":all:", + all_spat_units = NULL, + all_feat_types = NULL, + poly_info = NULL, + tag_cells = FALSE, + tag_cell_name = "tag", + tag_feats = FALSE, + tag_feats_name = "tag", + verbose = TRUE) { + # data.table vars + cell_ID <- feat_ID <- NULL + + # handle deprecations + if (!is.null(all_spat_units)) { + if (all_spat_units) { + spat_unit_fsub <- ":all:" + } else { + spat_unit_fsub <- spat_unit + } + + warning(wrap_txt( + 'filterGiotto: all_spat_units param is deprecated. Please use spat_unit_fsub = \":all:\" instead. (this is the default)' - )) - } - if (!is.null(all_feat_types)) { - if (all_feat_types) feat_type_ssub = ":all:" - else feat_type_ssub = feat_type - - warning(wrap_txt( - 'filterGiotto: - all_feat_types param is deprecated. - Please use feat_type_ssub = \":all:\" instead. (this is the default)' - )) - } - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - # set poly_info - if(is.null(poly_info)) { - poly_info = spat_unit - } - - if (verbose && length(spat_unit) > 1L) { - wrap_msg("More than one spat_unit provided.\n", - paste0("[", spat_unit[[1L]], "]"), - "filtering results will be applied across spat_units:", spat_unit) - } - if (verbose && length(feat_type) > 1L) { - wrap_msg("More than one feat_type provided.\n", - paste0("[", feat_type[[1L]], "]"), - "filtering results will be applied across spat_units:", feat_type) - } - - - # expression values to be used - values = match.arg(expression_values, unique(c('raw', 'normalized', 'scaled', 'custom', expression_values))) - - # get expression values to perform filtering on - # Only the first spat_unit and feat_type provided are filtered. - # IF there are additional spat_units and feat_types provided, then the filtering - # results from this round will be applied to the other provided spat_units - # and feat_types as well. - expr_values = get_expression_values( - gobject = gobject, - spat_unit = spat_unit[[1L]], - feat_type = feat_type[[1L]], - values = values, - output = 'matrix' - ) - - # approach: - # 1. first remove genes that are not frequently detected - # 2. then remove cells that do not have sufficient detected genes - - ## filter features - filter_index_feats = rowSums_flex(expr_values >= expression_threshold) >= feat_det_in_min_cells - selected_feat_ids = names(filter_index_feats[filter_index_feats == TRUE]) - - - - ## filter cells - filter_index_cells = colSums_flex(expr_values[filter_index_feats, ] >= expression_threshold) >= min_det_feats_per_cell - selected_cell_ids = names(filter_index_cells[filter_index_cells == TRUE]) - - - - # update cell metadata - if(isTRUE(tag_cells)) { - cell_meta = getCellMetadata(gobject = gobject, copy_obj = TRUE) - cell_meta[][, c(tag_cell_name) := ifelse(cell_ID %in% selected_cell_ids, 0, 1)] - gobject = setCellMetadata( - gobject = gobject, x = cell_meta, initialize = FALSE + )) + } + if (!is.null(all_feat_types)) { + if (all_feat_types) { + feat_type_ssub <- ":all:" + } else { + feat_type_ssub <- feat_type + } + + warning(wrap_txt( + 'filterGiotto: all_feat_types param is deprecated. + Please use feat_type_ssub = \":all:\" instead. + (this is the default)' + )) + } + + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + # set poly_info + if (is.null(poly_info)) { + poly_info <- spat_unit + } + + if (verbose && length(spat_unit) > 1L) { + wrap_msg( + "More than one spat_unit provided.\n", + paste0("[", spat_unit[[1L]], "]"), + "filtering results will be applied across spat_units:", spat_unit + ) + } + if (verbose && length(feat_type) > 1L) { + wrap_msg( + "More than one feat_type provided.\n", + paste0("[", feat_type[[1L]], "]"), + "filtering results will be applied across spat_units:", feat_type + ) + } - # set selected cells back to all cells - selected_cell_ids = names(filter_index_cells) - } - if(isTRUE(tag_feats)) { - feat_meta = getFeatureMetadata(gobject = gobject, copy_obj = TRUE) - feat_meta[][, c(tag_feats_name) := ifelse(feat_ID %in% selected_feat_ids, 0, 1)] - gobject = setFeatureMetadata( - gobject = gobject, x = feat_meta, initialize = FALSE + # expression values to be used + values <- match.arg( + expression_values, + unique(c("raw", "normalized", "scaled", "custom", expression_values))) + + # get expression values to perform filtering on + # Only the first spat_unit and feat_type provided are filtered. + # IF there are additional spat_units and feat_types provided, then the + # filtering + # results from this round will be applied to the other provided spat_units + # and feat_types as well. + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit[[1L]], + feat_type = feat_type[[1L]], + values = values, + output = "matrix" ) - # set selected feats back to all feats - selected_feat_ids = names(filter_index_feats) - } + # approach: + # 1. first remove genes that are not frequently detected + # 2. then remove cells that do not have sufficient detected genes + ## filter features + filter_index_feats <- rowSums_flex( + expr_values >= expression_threshold) >= feat_det_in_min_cells + selected_feat_ids <- names(filter_index_feats[filter_index_feats == TRUE]) - # update feature metadata - newGiottoObject = subsetGiotto( - gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = selected_cell_ids, - feat_ids = selected_feat_ids, - spat_unit_fsub = spat_unit_fsub, - feat_type_ssub = feat_type_ssub, - poly_info = poly_info, - verbose = verbose - ) - ## print output ## - removed_feats = length(filter_index_feats[filter_index_feats == FALSE]) - total_feats = length(filter_index_feats) + ## filter cells + filter_index_cells <- colSums_flex(expr_values[ + filter_index_feats, ] >= expression_threshold) >= min_det_feats_per_cell + selected_cell_ids <- names(filter_index_cells[filter_index_cells == TRUE]) - removed_cells = length(filter_index_cells[filter_index_cells == FALSE]) - total_cells = length(filter_index_cells) - if(isTRUE(verbose)) { - cat('\n') - cat('Feature type: ', feat_type, '\n') - if(isTRUE(tag_cells)) { - cat('Number of cells tagged: ', removed_cells, ' out of ', total_cells, '\n') - } else { - cat('Number of cells removed: ', removed_cells, ' out of ', total_cells, '\n') + # update cell metadata + if (isTRUE(tag_cells)) { + cell_meta <- getCellMetadata(gobject = gobject, copy_obj = TRUE) + cell_meta[][, c(tag_cell_name) := ifelse( + cell_ID %in% selected_cell_ids, 0, 1)] + gobject <- setCellMetadata( + gobject = gobject, x = cell_meta, initialize = FALSE + ) + + # set selected cells back to all cells + selected_cell_ids <- names(filter_index_cells) } - if(isTRUE(tag_feats)) { - cat('Number of feats tagged: ', removed_feats, ' out of ', total_feats, '\n') - } else { - cat('Number of feats removed: ', removed_feats, ' out of ', total_feats, '\n') + if (isTRUE(tag_feats)) { + feat_meta <- getFeatureMetadata(gobject = gobject, copy_obj = TRUE) + feat_meta[][, c(tag_feats_name) := ifelse( + feat_ID %in% selected_feat_ids, 0, 1)] + gobject <- setFeatureMetadata( + gobject = gobject, x = feat_meta, initialize = FALSE + ) + + # set selected feats back to all feats + selected_feat_ids <- names(filter_index_feats) } - } - ## update parameters used ## - # Do not update downstream of processGiotto - # Parameters will be updated within processGiotto - try({ - upstream_func = sys.call(-2) - fname = as.character(upstream_func[[1]]) - if (fname == 'processGiotto') return(newGiottoObject) - }, - silent = TRUE) + # update feature metadata + newGiottoObject <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = selected_cell_ids, + feat_ids = selected_feat_ids, + spat_unit_fsub = spat_unit_fsub, + feat_type_ssub = feat_type_ssub, + poly_info = poly_info, + verbose = verbose + ) + + ## print output ## + removed_feats <- length(filter_index_feats[filter_index_feats == FALSE]) + total_feats <- length(filter_index_feats) + + removed_cells <- length(filter_index_cells[filter_index_cells == FALSE]) + total_cells <- length(filter_index_cells) + + if (isTRUE(verbose)) { + cat("\n") + cat("Feature type: ", feat_type, "\n") + + if (isTRUE(tag_cells)) { + cat("Number of cells tagged: ", removed_cells, " out of ", + total_cells, "\n") + } else { + cat("Number of cells removed: ", removed_cells, " out of ", + total_cells, "\n") + } + + if (isTRUE(tag_feats)) { + cat("Number of feats tagged: ", removed_feats, " out of ", + total_feats, "\n") + } else { + cat("Number of feats removed: ", removed_feats, " out of ", + total_feats, "\n") + } + } - # If this function call is not downstream of processGiotto, update normally - newGiottoObject = update_giotto_params(newGiottoObject, description = '_filter') + ## update parameters used ## - return(newGiottoObject) + # Do not update downstream of processGiotto + # Parameters will be updated within processGiotto + try( + { + upstream_func <- sys.call(-2) + fname <- as.character(upstream_func[[1]]) + if (fname == "processGiotto") { + return(newGiottoObject) + } + }, + silent = TRUE + ) + # If this function call is not downstream of processGiotto, update normally + newGiottoObject <- update_giotto_params( + newGiottoObject, description = "_filter") + + return(newGiottoObject) } @@ -617,135 +695,139 @@ filterGiotto = function(gobject, #' @name .rna_standard_normalization #' @description standard function for RNA normalization #' @keywords internal -.rna_standard_normalization = function(gobject, - raw_expr, - feat_type, - spat_unit, - library_size_norm = TRUE, - scalefactor = 6e3, - log_norm = TRUE, - log_offset = 1, - logbase = 2, - scale_feats = TRUE, - scale_cells = TRUE, - scale_order = c('first_feats', 'first_cells'), - verbose = TRUE) { - - # check feature type compatibility - if(!feat_type %in% c('rna', 'RNA')) { - warning('Caution: Standard normalization was developed for RNA data \n') - } - - # evaluate provenance before modifying raw_expr in case h5_file exists - if(isS4(raw_expr)) { - provenance = raw_expr@provenance - } else {provenance = NULL} +.rna_standard_normalization <- function(gobject, + raw_expr, + feat_type, + spat_unit, + library_size_norm = TRUE, + scalefactor = 6e3, + log_norm = TRUE, + log_offset = 1, + logbase = 2, + scale_feats = TRUE, + scale_cells = TRUE, + scale_order = c("first_feats", "first_cells"), + verbose = TRUE) { + # check feature type compatibility + if (!feat_type %in% c("rna", "RNA")) { + warning("Caution: Standard normalization was developed for RNA data \n") + } + # evaluate provenance before modifying raw_expr in case h5_file exists + if (isS4(raw_expr)) { + provenance <- raw_expr@provenance + } else { + provenance <- NULL + } - feat_names = rownames(raw_expr[]) - col_names = colnames(raw_expr[]) + feat_names <- rownames(raw_expr[]) + col_names <- colnames(raw_expr[]) - ## 1. library size normalize - if(library_size_norm == TRUE) { - norm_expr = .lib_norm_giotto(mymatrix = raw_expr[], - scalefactor = scalefactor) - } else { - norm_expr = raw_expr[] - } - ## 2. lognormalize - if(log_norm == TRUE) { - norm_expr = .log_norm_giotto(mymatrix = norm_expr, - base = logbase, - offset = log_offset) - } + ## 1. library size normalize + if (library_size_norm == TRUE) { + norm_expr <- .lib_norm_giotto( + mymatrix = raw_expr[], + scalefactor = scalefactor + ) + } else { + norm_expr <- raw_expr[] + } - ## 3. scale - if(scale_feats == TRUE & scale_cells == TRUE) { + ## 2. lognormalize + if (log_norm == TRUE) { + norm_expr <- .log_norm_giotto( + mymatrix = norm_expr, + base = logbase, + offset = log_offset + ) + } - scale_order = match.arg(arg = scale_order, choices = c('first_feats', 'first_cells')) + ## 3. scale + if (scale_feats == TRUE & scale_cells == TRUE) { + scale_order <- match.arg( + arg = scale_order, choices = c("first_feats", "first_cells")) - if(scale_order == 'first_feats') { - if(isTRUE(verbose)) wrap_msg('\n first scale feats and then cells \n') + if (scale_order == "first_feats") { + if (isTRUE(verbose)) + wrap_msg("\n first scale feats and then cells \n") - norm_scaled_expr = t_flex(standardise_flex(x = t_flex(norm_expr), center = TRUE, scale = TRUE)) - norm_scaled_expr = standardise_flex(x = norm_scaled_expr, center = TRUE, scale = TRUE) + norm_scaled_expr <- t_flex(standardise_flex( + x = t_flex(norm_expr), center = TRUE, scale = TRUE)) + norm_scaled_expr <- standardise_flex( + x = norm_scaled_expr, center = TRUE, scale = TRUE) - #if(!methods::is(norm_expr, class2 = 'matrix')) norm_expr = as.matrix(norm_expr) - #norm_scaled_expr = t(Rfast::standardise(x = t(norm_expr), center = TRUE, scale = TRUE)) - #norm_scaled_expr = Rfast::standardise(x = norm_scaled_expr, center = TRUE, scale = TRUE) + } else if (scale_order == "first_cells") { + if (isTRUE(verbose)) + wrap_msg("\n first scale cells and then feats \n") - } else if(scale_order == 'first_cells') { - if(isTRUE(verbose)) wrap_msg('\n first scale cells and then feats \n') + norm_scaled_expr <- standardise_flex( + x = norm_expr, center = TRUE, scale = TRUE) + norm_scaled_expr <- t_flex(standardise_flex( + x = t_flex(norm_scaled_expr), center = TRUE, scale = TRUE)) - norm_scaled_expr = standardise_flex(x = norm_expr, center = TRUE, scale = TRUE) - norm_scaled_expr = t_flex(standardise_flex(x = t_flex(norm_scaled_expr), center = TRUE, scale = TRUE)) + } else { + stop("\n scale order must be given \n") + } + } else if (scale_feats == TRUE) { + norm_scaled_expr <- t_flex(standardise_flex( + x = t_flex(norm_expr), center = TRUE, scale = TRUE)) - #if(!methods::is(norm_expr, class2 = 'matrix')) norm_expr = as.matrix(norm_expr) - #norm_scaled_expr = Rfast::standardise(x = norm_expr, center = TRUE, scale = TRUE) - #norm_scaled_expr = t(Rfast::standardise(x = t(norm_scaled_expr), center = TRUE, scale = TRUE)) + } else if (scale_cells == TRUE) { + norm_scaled_expr <- standardise_flex( + x = norm_expr, center = TRUE, scale = TRUE) } else { - stop('\n scale order must be given \n') + norm_scaled_expr <- NULL } - } else if(scale_feats == TRUE) { - - norm_scaled_expr = t_flex(standardise_flex(x = t_flex(norm_expr), center = TRUE, scale = TRUE)) - - #if(!methods::is(norm_expr, class2 = 'matrix')) norm_expr = as.matrix(norm_expr) - #norm_scaled_expr = t(Rfast::standardise(x = t(norm_expr), center = TRUE, scale = TRUE)) - - } else if(scale_cells == TRUE) { - - norm_scaled_expr = standardise_flex(x = norm_expr, center = TRUE, scale = TRUE) - - #if(!methods::is(norm_expr, class2 = 'matrix')) norm_expr = as.matrix(norm_expr) - #norm_scaled_expr = Rfast::standardise(x = norm_expr, center = TRUE, scale = TRUE) - - } else { - norm_scaled_expr = NULL - } + ## 4. add cell and gene names back + if (!is.null(norm_expr)) { + rownames(norm_expr) <- feat_names + colnames(norm_expr) <- col_names + } + if (!is.null(norm_scaled_expr)) { + rownames(norm_scaled_expr) <- feat_names + colnames(norm_scaled_expr) <- col_names + } - ## 4. add cell and gene names back - if(!is.null(norm_expr)) { - rownames(norm_expr) = feat_names - colnames(norm_expr) = col_names - } - if(!is.null(norm_scaled_expr)) { - rownames(norm_scaled_expr) = feat_names - colnames(norm_scaled_expr) = col_names - } - - ## 5. create and set exprObj - norm_expr = create_expr_obj(name = 'normalized', - exprMat = norm_expr, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = provenance, - misc = NULL) + ## 5. create and set exprObj + norm_expr <- create_expr_obj( + name = "normalized", + exprMat = norm_expr, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = provenance, + misc = NULL + ) - norm_scaled_expr = create_expr_obj(name = 'scaled', - exprMat = norm_scaled_expr, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = provenance, - misc = NULL) + norm_scaled_expr <- create_expr_obj( + name = "scaled", + exprMat = norm_scaled_expr, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = provenance, + misc = NULL + ) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_expression_values(gobject = gobject, - values = norm_expr) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_expression_values( + gobject = gobject, + values = norm_expr + ) - gobject = set_expression_values(gobject = gobject, - values = norm_scaled_expr) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_expression_values( + gobject = gobject, + values = norm_scaled_expr + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - ## 6. return Giotto object - return(gobject) + ## 6. return Giotto object + return(gobject) } @@ -754,116 +836,130 @@ filterGiotto = function(gobject, #' @name .rna_osmfish_normalization #' @description function for RNA normalization according to osmFISH paper #' @keywords internal -.rna_osmfish_normalization = function(gobject, - raw_expr, - feat_type, - spat_unit, - name = 'custom', - verbose = TRUE) { - - # check feature type compatibility - if(!feat_type %in% c('rna', 'RNA')) { - warning('Caution: osmFISH normalization was developed for RNA in situ data \n') - } - - # 1. normalize per gene with scale-factor equal to number of genes - norm_feats = (raw_expr[]/rowSums_flex(raw_expr[])) * nrow(raw_expr[]) - # 2. normalize per cells with scale-factor equal to number of cells - norm_feats_cells = t_flex((t_flex(norm_feats)/colSums_flex(norm_feats)) * ncol(raw_expr[])) - - # return results to Giotto object - if(verbose == TRUE) message('\n osmFISH-like normalized data will be returned to the', name, 'Giotto slot \n') - - norm_feats_cells = create_expr_obj(name = name, - exprMat = norm_feats_cells, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = raw_expr@provenance) - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_expression_values(gobject = gobject, - values = norm_feats_cells) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - - return(gobject) +.rna_osmfish_normalization <- function(gobject, + raw_expr, + feat_type, + spat_unit, + name = "custom", + verbose = TRUE) { + # check feature type compatibility + if (!feat_type %in% c("rna", "RNA")) { + warning("Caution: osmFISH normalization was developed for RNA in situ + data \n") + } + + # 1. normalize per gene with scale-factor equal to number of genes + norm_feats <- (raw_expr[] / rowSums_flex(raw_expr[])) * nrow(raw_expr[]) + # 2. normalize per cells with scale-factor equal to number of cells + norm_feats_cells <- t_flex((t_flex(norm_feats) / + colSums_flex(norm_feats)) * ncol(raw_expr[])) + + # return results to Giotto object + if (verbose == TRUE) + message("\n osmFISH-like normalized data will be returned to the", + name, "Giotto slot \n") + + norm_feats_cells <- create_expr_obj( + name = name, + exprMat = norm_feats_cells, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = raw_expr@provenance + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_expression_values( + gobject = gobject, + values = norm_feats_cells + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + return(gobject) } #' @title RNA pearson residuals normalization #' @name .rna_pears_resid_normalization -#' @description function for RNA normalization according to Lause/Kobak et al paper +#' @description function for RNA normalization according to Lause/Kobak et al +#' paper #' Adapted from https://gist.github.com/hypercompetent/51a3c428745e1c06d826d76c3671797c#file-pearson_residuals-r #' @keywords internal -.rna_pears_resid_normalization = function(gobject, - raw_expr, - feat_type, - spat_unit, - theta = 100, - name = 'scaled', - verbose = TRUE) { - - - # print message with information # - if(verbose) message("using 'Lause/Kobak' method to normalize count matrix If used in published research, please cite: - Jan Lause, Philipp Berens, Dmitry Kobak (2020). - 'Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data' ") - - - # check feature type compatibility - if(!feat_type %in% c('rna', 'RNA')) { - warning('Caution: pearson residual normalization was developed for RNA count normalization \n') - } - - if(methods::is(raw_expr[], 'HDF5Matrix')) { - - counts_sum0 = methods::as(matrix(MatrixGenerics::colSums2(raw_expr[]),nrow=1),"HDF5Matrix") - counts_sum1 = methods::as(matrix(MatrixGenerics::rowSums2(raw_expr[]),ncol=1),"HDF5Matrix") - counts_sum = sum(raw_expr[]) - - #get residuals - mu = (counts_sum1 %*% counts_sum0) / counts_sum - z = (raw_expr[] - mu) / sqrt(mu + mu^2/theta) - - #clip to sqrt(n) - n = ncol(raw_expr[]) - z[z > sqrt(n)] = sqrt(n) - z[z < -sqrt(n)] = -sqrt(n) - - } else { - - - counts_sum0 = methods::as(matrix(Matrix::colSums(raw_expr[]),nrow=1),"dgCMatrix") - counts_sum1 = methods::as(matrix(Matrix::rowSums(raw_expr[]),ncol=1),"dgCMatrix") - counts_sum = sum(raw_expr[]) - - #get residuals - mu = (counts_sum1 %*% counts_sum0) / counts_sum - z = (raw_expr[] - mu) / sqrt(mu + mu^2/theta) - - #clip to sqrt(n) - n = ncol(raw_expr[]) - z[z > sqrt(n)] = sqrt(n) - z[z < -sqrt(n)] = -sqrt(n) - - } - - # return results to Giotto object - if(verbose == TRUE) message('\n Pearson residual normalized data will be returned to the ', name, ' Giotto slot \n') +.rna_pears_resid_normalization <- function(gobject, + raw_expr, + feat_type, + spat_unit, + theta = 100, + name = "scaled", + verbose = TRUE) { + # print message with information # + if (verbose) + message("using 'Lause/Kobak' method to normalize count matrix If used in + published research, please cite: + Jan Lause, Philipp Berens, Dmitry Kobak (2020). + 'Analytic Pearson residuals for normalization of single-cell RNA-seq UMI + data' ") + + + # check feature type compatibility + if (!feat_type %in% c("rna", "RNA")) { + warning("Caution: pearson residual normalization was developed for RNA + count normalization \n") + } - z = create_expr_obj(name = name, - exprMat = z, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = raw_expr@provenance) + if (methods::is(raw_expr[], "HDF5Matrix")) { + counts_sum0 <- methods::as(matrix( + MatrixGenerics::colSums2(raw_expr[]), nrow = 1), "HDF5Matrix") + counts_sum1 <- methods::as(matrix( + MatrixGenerics::rowSums2(raw_expr[]), ncol = 1), "HDF5Matrix") + counts_sum <- sum(raw_expr[]) + + # get residuals + mu <- (counts_sum1 %*% counts_sum0) / counts_sum + z <- (raw_expr[] - mu) / sqrt(mu + mu^2 / theta) + + # clip to sqrt(n) + n <- ncol(raw_expr[]) + z[z > sqrt(n)] <- sqrt(n) + z[z < -sqrt(n)] <- -sqrt(n) + } else { + counts_sum0 <- methods::as(matrix(Matrix::colSums( + raw_expr[]), nrow = 1), "dgCMatrix") + counts_sum1 <- methods::as(matrix(Matrix::rowSums( + raw_expr[]), ncol = 1), "dgCMatrix") + counts_sum <- sum(raw_expr[]) + + # get residuals + mu <- (counts_sum1 %*% counts_sum0) / counts_sum + z <- (raw_expr[] - mu) / sqrt(mu + mu^2 / theta) + + # clip to sqrt(n) + n <- ncol(raw_expr[]) + z[z > sqrt(n)] <- sqrt(n) + z[z < -sqrt(n)] <- -sqrt(n) + } - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_expression_values(gobject = gobject, - values = z) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + # return results to Giotto object + if (verbose == TRUE) + message("\n Pearson residual normalized data will be returned to the ", + name, " Giotto slot \n") + + z <- create_expr_obj( + name = name, + exprMat = z, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = raw_expr@provenance + ) - return(gobject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_expression_values( + gobject = gobject, + values = z + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + return(gobject) } @@ -887,13 +983,14 @@ filterGiotto = function(gobject, #' @param scale_cells z-score cells over all genes #' @param scale_order order to scale feats and cells #' @param theta theta parameter for the pearson residual normalization step -#' @param update_slot slot or name to use for the results from osmFISH and pearson residual normalization +#' @param update_slot slot or name to use for the results from osmFISH and +#' pearson residual normalization #' @param verbose be verbose #' @return giotto object #' @details Currently there are two 'methods' to normalize your raw counts data. #' -#' A. The standard method follows the standard protocol which can be adjusted using -#' the provided parameters and follows the following order: \cr +#' A. The standard method follows the standard protocol which can be adjusted +#' using the provided parameters and follows the following order: \cr #' \itemize{ #' \item{1. Data normalization for total library size and scaling by a custom scale-factor.} #' \item{2. Log transformation of data.} @@ -911,247 +1008,269 @@ filterGiotto = function(gobject, #' \item{1. First calculate expected values based on Pearson correlations.} #' \item{2. Next calculate z-scores based on observed and expected values.} #' } -#' By default the latter two results will be saved in the Giotto slot for scaled expression, -#' this can be changed by changing the update_slot parameters +#' By default the latter two results will be saved in the Giotto slot for +#' scaled expression, this can be changed by changing the update_slot parameters #' @export -normalizeGiotto = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = 'raw', - norm_methods = c('standard', 'pearson_resid', 'osmFISH'), - library_size_norm = TRUE, - scalefactor = 6e3, - log_norm = TRUE, - log_offset = 1, - logbase = 2, - scale_feats = TRUE, - scale_genes = NULL, - scale_cells = TRUE, - scale_order = c('first_feats', 'first_cells'), - theta = 100, - update_slot = 'scaled', - verbose = TRUE) { - - - - ## deprecated arguments - if(!is.null(scale_genes)) { - scale_feats = scale_genes - warning('scale_genes is deprecated, use scale_feats in the future \n') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## default is to start from raw data - values = match.arg(expression_values, unique(c('raw', expression_values))) - raw_expr = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - norm_methods = match.arg(arg = norm_methods, choices = c('standard', 'pearson_resid', 'osmFISH')) - - # normalization according to standard methods - if(norm_methods == 'standard') { - - gobject = .rna_standard_normalization(gobject = gobject, - raw_expr = raw_expr, - feat_type = feat_type, - spat_unit = spat_unit, - library_size_norm = library_size_norm, - scalefactor = scalefactor, - log_norm = log_norm, - log_offset = log_offset, - logbase = logbase, - scale_feats = scale_feats, - scale_cells = scale_cells, - scale_order = scale_order, - verbose = verbose) - - - } - - else if(norm_methods == 'osmFISH') { - - gobject = .rna_osmfish_normalization(gobject = gobject, - raw_expr = raw_expr, - feat_type = feat_type, - spat_unit = spat_unit, - name = update_slot, - verbose = verbose) - - } - - else if(norm_methods == 'pearson_resid') { - - gobject = .rna_pears_resid_normalization(gobject = gobject, - raw_expr = raw_expr, - feat_type = feat_type, - spat_unit = spat_unit, - theta = theta, - name = update_slot, - verbose = verbose) - - } - - ## update parameters used ## - - # Do not update downstream of processGiotto - # Parameters will be updated within processGiotto - try({ - upstream_func = sys.call(-2) - fname = as.character(upstream_func[[1]]) - if (fname == 'processGiotto') return(gobject) - }, - silent = TRUE) - - - # If this function call is not downstream of processGiotto, update normally - gobject = update_giotto_params(gobject, description = '_normalize') - - return(gobject) +normalizeGiotto <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = "raw", + norm_methods = c("standard", "pearson_resid", "osmFISH"), + library_size_norm = TRUE, + scalefactor = 6e3, + log_norm = TRUE, + log_offset = 1, + logbase = 2, + scale_feats = TRUE, + scale_genes = NULL, + scale_cells = TRUE, + scale_order = c("first_feats", "first_cells"), + theta = 100, + update_slot = "scaled", + verbose = TRUE) { + ## deprecated arguments + if (!is.null(scale_genes)) { + scale_feats <- scale_genes + warning("scale_genes is deprecated, use scale_feats in the future \n") + } + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + ## default is to start from raw data + values <- match.arg(expression_values, unique(c("raw", expression_values))) + raw_expr <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + norm_methods <- match.arg( + arg = norm_methods, choices = c("standard", "pearson_resid", "osmFISH")) + + # normalization according to standard methods + if (norm_methods == "standard") { + gobject <- .rna_standard_normalization( + gobject = gobject, + raw_expr = raw_expr, + feat_type = feat_type, + spat_unit = spat_unit, + library_size_norm = library_size_norm, + scalefactor = scalefactor, + log_norm = log_norm, + log_offset = log_offset, + logbase = logbase, + scale_feats = scale_feats, + scale_cells = scale_cells, + scale_order = scale_order, + verbose = verbose + ) + } else if (norm_methods == "osmFISH") { + gobject <- .rna_osmfish_normalization( + gobject = gobject, + raw_expr = raw_expr, + feat_type = feat_type, + spat_unit = spat_unit, + name = update_slot, + verbose = verbose + ) + } else if (norm_methods == "pearson_resid") { + gobject <- .rna_pears_resid_normalization( + gobject = gobject, + raw_expr = raw_expr, + feat_type = feat_type, + spat_unit = spat_unit, + theta = theta, + name = update_slot, + verbose = verbose + ) + } + + ## update parameters used ## + + # Do not update downstream of processGiotto + # Parameters will be updated within processGiotto + try( + { + upstream_func <- sys.call(-2) + fname <- as.character(upstream_func[[1]]) + if (fname == "processGiotto") { + return(gobject) + } + }, + silent = TRUE + ) + + + # If this function call is not downstream of processGiotto, update normally + gobject <- update_giotto_params(gobject, description = "_normalize") + + return(gobject) } #' @title Adjust expression values #' @name adjustGiottoMatrix -#' @description Adjust expression values to account for known batch effects or technological covariates. +#' @description Adjust expression values to account for known batch effects or +#' technological covariates. #' @inheritParams data_access_params #' @param expression_values expression values to use -#' @param batch_columns metadata columns that represent different batch (max = 2) -#' @param covariate_columns metadata columns that represent covariates to regress out +#' @param batch_columns metadata columns that represent different +#' batch (max = 2) +#' @param covariate_columns metadata columns that represent covariates to +#' regress out #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param update_slot expression slot that will be updated (default = custom) #' @return giotto object or exprObj -#' @details This function implements the \code{\link[limma]{removeBatchEffect}} function to -#' remove known batch effects and to adjust expression values according to provided covariates. +#' @details This function implements the \code{\link[limma]{removeBatchEffect}} +#' function to remove known batch effects and to adjust expression values +#' according to provided covariates. #' @export #' adjustGiottoMatrix <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - batch_columns = NULL, - covariate_columns = NULL, - return_gobject = TRUE, - update_slot = c('custom')) { - - # Catch for both batch and covariate being null - if (is.null(batch_columns) & is.null(covariate_columns)){ - stop('\nMetadata for either different batches or covariates must be provided.') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # metadata - cell_metadata = getCellMetadata( - gobject, - feat_type = feat_type, - spat_unit = spat_unit, - output = 'data.table', - copy_obj = TRUE - ) - - if(!is.null(batch_columns)) { - if(!all(batch_columns %in% colnames(cell_metadata))) { - stop('\n batch column name(s) were not found in the cell metadata \n') - } - } - - if(!is.null(covariate_columns)) { - if(!all(covariate_columns %in% colnames(cell_metadata))) { - stop('\n covariate column name(s) were not found in the cell metadata \n') + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + batch_columns = NULL, + covariate_columns = NULL, + return_gobject = TRUE, + update_slot = c("custom")) { + # Catch for both batch and covariate being null + if (is.null(batch_columns) & is.null(covariate_columns)) { + stop("Metadata for either different batches or covariates must be + provided.") } - } - update_slot = match.arg(update_slot, c('normalized', 'scaled', 'custom', update_slot)) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + # metadata + cell_metadata <- getCellMetadata( + gobject, + feat_type = feat_type, + spat_unit = spat_unit, + output = "data.table", + copy_obj = TRUE + ) - # batch columns - if(!is.null(batch_columns)) { - batch_column_1 = cell_metadata[[ batch_columns[1] ]] - if(length(batch_columns) > 1) { - batch_column_2 = cell_metadata[[ batch_columns[2] ]] - } else { - batch_column_2 = NULL + if (!is.null(batch_columns)) { + if (!all(batch_columns %in% colnames(cell_metadata))) { + stop("batch column name(s) were not found in the cell metadata") + } } - } else { - batch_column_1 = NULL - batch_column_2 = NULL - } - - # covariate columns - if(!is.null(covariate_columns)) { - covariates = as.matrix(cell_metadata[, covariate_columns, with = F]) - } else { - covariates = NULL - } - - - # TODO: implement ResidualMatrix to work with a delayed matrix - adjusted_matrix = limma::removeBatchEffect(x = expr_data[], - batch = batch_column_1, - batch2 = batch_column_2, - covariates = covariates) - - if(return_gobject == TRUE) { - - adjusted_matrix = create_expr_obj(name = update_slot, - exprMat = adjusted_matrix, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = expr_data@provenance) - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_expression_values(gobject = gobject, - values = adjusted_matrix) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - ## update parameters used ## + if (!is.null(covariate_columns)) { + if (!all(covariate_columns %in% colnames(cell_metadata))) { + stop("covariate column name(s) were not found in the cell metadata") + } + } - # Do not update downstream of processGiotto - # Parameters will be updated within processGiotto - try({ - test = sys.call(-2) - fname = as.character(test[[1]]) - if (fname == 'processGiotto') return(gobject) - }, - silent = TRUE) + update_slot <- match.arg( + update_slot, c("normalized", "scaled", "custom", update_slot)) + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_data <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) - # If this function call is not downstream of processGiotto, update normally - gobject = update_giotto_params(gobject, description = '_adj_matrix') + # batch columns + if (!is.null(batch_columns)) { + batch_column_1 <- cell_metadata[[batch_columns[1]]] + if (length(batch_columns) > 1) { + batch_column_2 <- cell_metadata[[batch_columns[2]]] + } else { + batch_column_2 <- NULL + } + } else { + batch_column_1 <- NULL + batch_column_2 <- NULL + } - return(gobject) + # covariate columns + if (!is.null(covariate_columns)) { + covariates <- as.matrix( + cell_metadata[, covariate_columns, with = FALSE]) + } else { + covariates <- NULL + } - } else { - return(adjusted_matrix) - } + # TODO: implement ResidualMatrix to work with a delayed matrix + adjusted_matrix <- limma::removeBatchEffect( + x = expr_data[], + batch = batch_column_1, + batch2 = batch_column_2, + covariates = covariates + ) + if (return_gobject == TRUE) { + adjusted_matrix <- create_expr_obj( + name = update_slot, + exprMat = adjusted_matrix, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = expr_data@provenance + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_expression_values( + gobject = gobject, + values = adjusted_matrix + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + ## update parameters used ## + + # Do not update downstream of processGiotto + # Parameters will be updated within processGiotto + try( + { + test <- sys.call(-2) + fname <- as.character(test[[1]]) + if (fname == "processGiotto") { + return(gobject) + } + }, + silent = TRUE + ) + + + # If this function call is not downstream of processGiotto, update + # normally + gobject <- update_giotto_params(gobject, description = "_adj_matrix") + + return(gobject) + } else { + return(adjusted_matrix) + } } @@ -1162,50 +1281,56 @@ adjustGiottoMatrix <- function(gobject, #' @param filter_params additional parameters to filterGiotto #' @param norm_params additional parameters to normalizeGiotto #' @param stat_params additional parameters to addStatistics -#' @param adjust_params additional parameters to adjustGiottoMatrix; set to NULL if not required +#' @param adjust_params additional parameters to adjustGiottoMatrix; set to +#' NULL if not required #' @param verbose be verbose (default is TRUE) #' @return giotto object #' @details See \code{\link{filterGiotto}}, \code{\link{normalizeGiotto}}, #' \code{\link{addStatistics}}, and \code{\link{adjustGiottoMatrix}}. For more -#' information about the different parameters in each step. If you do not provide -#' them it will use the default values. If no adjustment is required, adjust_params must be set to NULL +#' information about the different parameters in each step. If you do not +#' provide them it will use the default values. If no adjustment is required, +#' adjust_params must be set to NULL #' @export #' -processGiotto = function(gobject, - filter_params = list(), - norm_params = list(), - stat_params = list(), - adjust_params = list(), - verbose = TRUE) { - - # filter Giotto - if(verbose == TRUE) cat('1. start filter step \n') - if(!inherits(filter_params, 'list')) stop('filter_params need to be a list of parameters for filterGiotto \n') - gobject = do.call('filterGiotto', c(gobject = gobject, filter_params)) - - # normalize Giotto - if(verbose == TRUE) cat('2. start normalization step \n') - if(!inherits(norm_params, 'list')) stop('norm_params need to be a list of parameters for normalizeGiotto \n') - gobject = do.call('normalizeGiotto', c(gobject = gobject, norm_params)) - - # add Statistics - if(verbose == TRUE) cat('3. start cell and gene statistics step \n') - if(!inherits(stat_params, 'list')) stop('stat_params need to be a list of parameters for addStatistics \n') - stat_params[['return_gobject']] = TRUE # force this to be true - gobject = do.call('addStatistics', c(gobject = gobject, stat_params)) - - # adjust Giotto, if applicable - if(!is.null(adjust_params)){ - if(verbose == TRUE) cat('4. start adjusted matrix step \n') - if(!inherits(adjust_params, 'list')) stop('adjust_params need to be a list of parameters for adjustGiottoMatrix \n') - adjust_params[['return_gobject']] = TRUE # force this to be true - gobject = do.call('adjustGiottoMatrix', c(gobject = gobject, adjust_params)) - } - - gobject = update_giotto_params(gobject, description = '_process') - - return(gobject) +processGiotto <- function(gobject, + filter_params = list(), + norm_params = list(), + stat_params = list(), + adjust_params = list(), + verbose = TRUE) { + # filter Giotto + if (verbose == TRUE) message("1. start filter step") + if (!inherits(filter_params, "list")) + stop("filter_params need to be a list of parameters for filterGiotto") + gobject <- do.call("filterGiotto", c(gobject = gobject, filter_params)) + + # normalize Giotto + if (verbose == TRUE) message("2. start normalization step") + if (!inherits(norm_params, "list")) + stop("norm_params need to be a list of parameters for normalizeGiotto") + gobject <- do.call("normalizeGiotto", c(gobject = gobject, norm_params)) + + # add Statistics + if (verbose == TRUE) message("3. start cell and gene statistics step") + if (!inherits(stat_params, "list")) + stop("stat_params need to be a list of parameters for addStatistics ") + stat_params[["return_gobject"]] <- TRUE # force this to be true + gobject <- do.call("addStatistics", c(gobject = gobject, stat_params)) + + # adjust Giotto, if applicable + if (!is.null(adjust_params)) { + if (verbose == TRUE) message("4. start adjusted matrix step") + if (!inherits(adjust_params, "list")) + stop("adjust_params need to be a list of parameters for + adjustGiottoMatrix") + adjust_params[["return_gobject"]] <- TRUE # force this to be true + gobject <- do.call( + "adjustGiottoMatrix", c(gobject = gobject, adjust_params)) + } + + gobject <- update_giotto_params(gobject, description = "_process") + return(gobject) } @@ -1245,117 +1370,137 @@ processGiotto = function(gobject, #' } #' @export addFeatStatistics <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - detection_threshold = 0, - return_gobject = TRUE) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - expression_values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data <- getExpression( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = expression_values, - output = 'exprObj', - set_defaults = FALSE - ) - - # calculate stats - feat_stats = data.table::data.table(feats = rownames(expr_data[]), - nr_cells = rowSums_flex(expr_data[] > detection_threshold), - perc_cells = (rowSums_flex(expr_data[] > detection_threshold)/ncol(expr_data[]))*100, - total_expr = rowSums_flex(expr_data[]), - mean_expr = rowMeans_flex(expr_data[])) - - # data.table variables - mean_expr_det = NULL - - mean_expr_detected = .mean_expr_det_test(expr_data[], detection_threshold = detection_threshold) - feat_stats[, mean_expr_det := mean_expr_detected] - - - if(return_gobject == TRUE) { - - # remove previous statistics - feat_metadata <- getFeatureMetadata( - gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'featMetaObj', - copy_obj = TRUE, - set_defaults = FALSE - ) - - if(isS4(expr_data)) { - if(!identical(expr_data@provenance, feat_metadata@provenance)) { - warning('expression and feature metadata provenance mismatch') - } - } - - - metadata_names = colnames(feat_metadata[]) - - if('nr_cells' %in% metadata_names) { - cat('\n feat statistics has already been applied once, will be overwritten \n') - feat_metadata[][, c('nr_cells', 'perc_cells', 'total_expr', 'mean_expr', 'mean_expr_det') := NULL] - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_feature_metadata(gobject, - metadata = feat_metadata, - verbose = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } - - gobject = addFeatMetadata(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - new_metadata = feat_stats, - by_column = TRUE, - column_feat_ID = 'feats') - - ## update parameters used ## - - # parent function name - cl = sys.call(-1) + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + detection_threshold = 0, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # Do not update downstream of processGiotto - # Parameters will be updated within processGiotto - try({ - upstream_func = sys.call(-3) - fname = as.character(upstream_func[[1]]) - if (fname == 'processGiotto') return(gobject) - }, - silent = TRUE) + # expression values to be used + expression_values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_data <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = expression_values, + output = "exprObj", + set_defaults = FALSE + ) + # calculate stats + feat_stats <- data.table::data.table( + feats = rownames(expr_data[]), + nr_cells = rowSums_flex(expr_data[] > detection_threshold), + perc_cells = (rowSums_flex(expr_data[] > detection_threshold) / + ncol(expr_data[])) * 100, + total_expr = rowSums_flex(expr_data[]), + mean_expr = rowMeans_flex(expr_data[]) + ) - # If this function call is not downstream of processGiotto, update normally - if(is.null(cl)) { - gobject = update_giotto_params(gobject, description = '_feat_stats') + # data.table variables + mean_expr_det <- NULL + + mean_expr_detected <- .mean_expr_det_test( + expr_data[], detection_threshold = detection_threshold) + feat_stats[, mean_expr_det := mean_expr_detected] + + + if (return_gobject == TRUE) { + # remove previous statistics + feat_metadata <- getFeatureMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "featMetaObj", + copy_obj = TRUE, + set_defaults = FALSE + ) + + if (isS4(expr_data)) { + if (!identical(expr_data@provenance, feat_metadata@provenance)) { + warning("expression and feature metadata provenance mismatch") + } + } + + + metadata_names <- colnames(feat_metadata[]) + + if ("nr_cells" %in% metadata_names) { + message("feat statistics has already been applied once, will be + overwritten") + feat_metadata[][, c( + "nr_cells", "perc_cells", "total_expr", "mean_expr", + "mean_expr_det") := NULL] + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_feature_metadata(gobject, + metadata = feat_metadata, + verbose = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + + gobject <- addFeatMetadata( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + new_metadata = feat_stats, + by_column = TRUE, + column_feat_ID = "feats" + ) + + ## update parameters used ## + + # parent function name + cl <- sys.call(-1) + + # Do not update downstream of processGiotto + # Parameters will be updated within processGiotto + try( + { + upstream_func <- sys.call(-3) + fname <- as.character(upstream_func[[1]]) + if (fname == "processGiotto") { + return(gobject) + } + }, + silent = TRUE + ) + + + # If this function call is not downstream of processGiotto, update + # normally + if (is.null(cl)) { + gobject <- update_giotto_params(gobject, + description = "_feat_stats") + } else { + fname <- as.character(cl[[1]]) + if (fname == "addStatistics") { + gobject <- update_giotto_params(gobject, + description = "_feat_stats", + toplevel = 3) + } else { + gobject <- update_giotto_params(gobject, + description = "_feat_stats") + } + } + + + return(gobject) } else { - fname = as.character(cl[[1]]) - if(fname == 'addStatistics') { - gobject = update_giotto_params(gobject, description = '_feat_stats', toplevel = 3) - } else { - gobject = update_giotto_params(gobject, description = '_feat_stats') - } + return(feat_stats) } - - - return(gobject) - - - } else { - return(feat_stats) - } - } @@ -1381,112 +1526,127 @@ addFeatStatistics <- function(gobject, #' } #' @export addCellStatistics <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - detection_threshold = 0, - return_gobject = TRUE) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - expression_values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data <- getExpression( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = expression_values, - output = 'exprObj', - set_defaults = FALSE - ) - - # calculate stats - - cell_stats = data.table::data.table(cells = colnames(expr_data[]), - nr_feats = colSums_flex(expr_data[] > detection_threshold), - perc_feats = (colSums_flex(expr_data[] > detection_threshold)/nrow(expr_data[]))*100, - total_expr = colSums_flex(expr_data[])) - - if(return_gobject == TRUE) { - - # remove previous statistics - cell_metadata = getCellMetadata( - gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE, - set_defaults = FALSE - ) - - if(isS4(expr_data)) { - if(!identical(expr_data@provenance, cell_metadata@provenance)) { - warning('expression and feature metadata provenance mismatch') - } - } - - metadata_names = colnames(cell_metadata[]) - if('nr_feats' %in% metadata_names) { - message('\n cells statistics has already been applied once, will be overwritten \n') - cell_metadata[][, c('nr_feats', 'perc_feats', 'total_expr') := NULL] - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_cell_metadata(gobject, - metadata = cell_metadata, - verbose = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } - - - - - gobject = addCellMetadata(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - new_metadata = cell_stats, - by_column = TRUE, - column_cell_ID = 'cells') + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + detection_threshold = 0, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - ## update parameters used ## + # expression values to be used + expression_values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_data <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = expression_values, + output = "exprObj", + set_defaults = FALSE + ) - # parent function name - cl = sys.call(-1) + # calculate stats - # Do not update downstream of processGiotto - # Parameters will be updated within processGiotto - try({ - upstream_func = sys.call(-3) - fname = as.character(upstream_func[[1]]) - if (fname == 'processGiotto') return(gobject) - }, - silent = TRUE) + cell_stats <- data.table::data.table( + cells = colnames(expr_data[]), + nr_feats = colSums_flex(expr_data[] > detection_threshold), + perc_feats = (colSums_flex(expr_data[] > detection_threshold) / + nrow(expr_data[])) * 100, + total_expr = colSums_flex(expr_data[]) + ) - # If this function call is not downstream of processGiotto, update normally - if(is.null(cl)) { - gobject = update_giotto_params(gobject, description = '_cell_stats') + if (return_gobject == TRUE) { + # remove previous statistics + cell_metadata <- getCellMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE, + set_defaults = FALSE + ) + + if (isS4(expr_data)) { + if (!identical(expr_data@provenance, cell_metadata@provenance)) { + warning("expression and feature metadata provenance mismatch") + } + } + + metadata_names <- colnames(cell_metadata[]) + if ("nr_feats" %in% metadata_names) { + message("cells statistics has already been applied once, will be + overwritten") + cell_metadata[][, c("nr_feats", "perc_feats", "total_expr") := NULL] + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_cell_metadata(gobject, + metadata = cell_metadata, + verbose = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + + + + + gobject <- addCellMetadata( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + new_metadata = cell_stats, + by_column = TRUE, + column_cell_ID = "cells" + ) + + ## update parameters used ## + + # parent function name + cl <- sys.call(-1) + + # Do not update downstream of processGiotto + # Parameters will be updated within processGiotto + try( + { + upstream_func <- sys.call(-3) + fname <- as.character(upstream_func[[1]]) + if (fname == "processGiotto") { + return(gobject) + } + }, + silent = TRUE + ) + + # If this function call is not downstream of processGiotto, update + # normally + if (is.null(cl)) { + gobject <- update_giotto_params(gobject, + description = "_cell_stats") + } else { + fname <- as.character(cl[[1]]) + if (fname == "addStatistics") { + gobject <- update_giotto_params(gobject, + description = "_cell_stats", + toplevel = 3) + } else { + gobject <- update_giotto_params(gobject, + description = "_cell_stats") + } + } + + + return(gobject) } else { - - fname = as.character(cl[[1]]) - if(fname == 'addStatistics') { - gobject = update_giotto_params(gobject, description = '_cell_stats', toplevel = 3) - } else { - gobject = update_giotto_params(gobject, description = '_cell_stats') - } - + return(cell_stats) } - - - return(gobject) - - - } else { - return(cell_stats) - } - } @@ -1503,48 +1663,52 @@ addCellStatistics <- function(gobject, #' @details See \code{\link{addFeatStatistics}} and \code{\link{addCellStatistics}} #' @export addStatistics <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - detection_threshold = 0, - return_gobject = TRUE) { - - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # get feats statistics - feat_stats = addFeatStatistics(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - detection_threshold = detection_threshold, - return_gobject = return_gobject) - - if(return_gobject == TRUE) { - gobject = feat_stats - } - - # get cell statistics - cell_stats = addCellStatistics(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - detection_threshold = detection_threshold, - return_gobject = return_gobject) - - if(return_gobject == TRUE) { - gobject = cell_stats - return(gobject) - } else { - return(feat_stats = feat_stats, cell_stats = cell_stats) - } + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + detection_threshold = 0, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + # get feats statistics + feat_stats <- addFeatStatistics( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + detection_threshold = detection_threshold, + return_gobject = return_gobject + ) + + if (return_gobject == TRUE) { + gobject <- feat_stats + } + + # get cell statistics + cell_stats <- addCellStatistics( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + detection_threshold = detection_threshold, + return_gobject = return_gobject + ) + + if (return_gobject == TRUE) { + gobject <- cell_stats + return(gobject) + } else { + return(feat_stats = feat_stats, cell_stats = cell_stats) + } } @@ -1552,7 +1716,8 @@ addStatistics <- function(gobject, #' @title addFeatsPerc #' @name addFeatsPerc -#' @description Calculates the total percentage of (normalized) counts for a subset of selected genes +#' @description Calculates the total percentage of (normalized) counts for a +#' subset of selected genes #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -1560,63 +1725,72 @@ addStatistics <- function(gobject, #' @param feats vector of selected features #' @param vector_name column name as seen in \code{\link{pDataDT}} #' @param return_gobject boolean: return giotto object (default = TRUE) -#' @return giotto object if \code{return_gobject = TRUE}, else a vector with % results +#' @return giotto object if \code{return_gobject = TRUE}, else a vector with % +#' results #' @export -addFeatsPerc = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - feats = NULL, - vector_name = 'feat_perc', - return_gobject = TRUE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # tests - if(is.null(feats)) { - stop('You need to provide a vector of feat names \n') - } - - if(!methods::is(gobject, 'giotto')) { - stop('You need to provide a giotto object \n') - } - - - # expression values to be used - expression_values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = expression_values, - output = 'matrix') - - - totalsum = colSums_flex(expr_data) - feat_sum = colSums_flex(expr_data[rownames(expr_data) %in% feats,]) - perc_feats = round((feat_sum/totalsum) * 100, 2) - - if(return_gobject) { - temp_gobj = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = perc_feats, - vector_name = vector_name, - by_column = TRUE) +addFeatsPerc <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + feats = NULL, + vector_name = "feat_perc", + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - ## update parameters used ## - temp_gobj = update_giotto_params(temp_gobj, description = '_feats_perc') + # tests + if (is.null(feats)) { + stop("You need to provide a vector of feat names") + } + + if (!methods::is(gobject, "giotto")) { + stop("You need to provide a giotto object") + } + + + # expression values to be used + expression_values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_data <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = expression_values, + output = "matrix" + ) - return(temp_gobj) - } else { - return(perc_feats) - } + totalsum <- colSums_flex(expr_data) + feat_sum <- colSums_flex(expr_data[rownames(expr_data) %in% feats, ]) + perc_feats <- round((feat_sum / totalsum) * 100, 2) + + if (return_gobject) { + temp_gobj <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = perc_feats, + vector_name = vector_name, + by_column = TRUE + ) + + ## update parameters used ## + temp_gobj <- update_giotto_params(temp_gobj, + description = "_feats_perc") + + return(temp_gobj) + } else { + return(perc_feats) + } } @@ -1631,62 +1805,66 @@ addFeatsPerc = function(gobject, #' @title Find network neighbors #' @name findNetworkNeighbors -#' @description Find the spatial neighbors for a selected group of cells within the selected spatial network. +#' @description Find the spatial neighbors for a selected group of cells within +#' the selected spatial network. #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param spatial_network_name name of spatial network -#' @param source_cell_ids cell ids for which you want to know the spatial neighbors +#' @param source_cell_ids cell ids for which you want to know the spatial +#' neighbors #' @param name name of the results #' @return data.table #' @export -findNetworkNeighbors = function(gobject, - spat_unit = NULL, - spatial_network_name = NULL, - source_cell_ids = NULL, - name = 'nb_cells') { - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - - # get spatial network - if(!is.null(spatial_network_name)) { - spatial_network = get_spatialNetwork(gobject, - spat_unit = spat_unit, - name = spatial_network_name, - output = 'networkDT') - } else { - stop('You need to select a spatial network') - } +findNetworkNeighbors <- function(gobject, + spat_unit = NULL, + spatial_network_name = NULL, + source_cell_ids = NULL, + name = "nb_cells") { + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) - # source cell ids that are found back - all_cell_ids = gobject@cell_ID[[spat_unit]] - source_cells = all_cell_ids[all_cell_ids %in% source_cell_ids] + # get spatial network + if (!is.null(spatial_network_name)) { + spatial_network <- getSpatialNetwork(gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "networkDT" + ) + } else { + stop("You need to select a spatial network") + } - if(length(source_cells) == 0) { - stop('No source cell ids were selected or found') - } + # source cell ids that are found back + all_cell_ids <- gobject@cell_ID[[spat_unit]] + source_cells <- all_cell_ids[all_cell_ids %in% source_cell_ids] + if (length(source_cells) == 0) { + stop("No source cell ids were selected or found") + } - full_network_DT = convert_to_full_spatial_network(spatial_network) - potential_target_cells = full_network_DT[source %in% source_cells][['target']] - source_and_target_cells = potential_target_cells[potential_target_cells %in% source_cells] - target_cells = potential_target_cells[!potential_target_cells %in% source_and_target_cells] - cell_meta = pDataDT(gobject) + full_network_DT <- convert_to_full_spatial_network(spatial_network) + potential_target_cells <- full_network_DT[ + source %in% source_cells][["target"]] + source_and_target_cells <- potential_target_cells[ + potential_target_cells %in% source_cells] + target_cells <- potential_target_cells[ + !potential_target_cells %in% source_and_target_cells] - # data.table variables - nb_cells = cell_ID = NULL + cell_meta <- pDataDT(gobject) - cell_meta[, nb_cells := ifelse(cell_ID %in% source_and_target_cells, 'both', - ifelse(cell_ID %in% source_cells, 'source', - ifelse(cell_ID %in% target_cells, 'neighbor', 'others')))] - nb_annot = cell_meta[, c('cell_ID', 'nb_cells'), with = FALSE] - data.table::setnames(nb_annot, 'nb_cells', name) + # data.table variables + nb_cells <- cell_ID <- NULL - return(nb_annot) + cell_meta[, nb_cells := ifelse(cell_ID %in% source_and_target_cells, "both", + ifelse(cell_ID %in% source_cells, "source", + ifelse(cell_ID %in% target_cells, "neighbor", "others") + ) + )] + nb_annot <- cell_meta[, c("cell_ID", "nb_cells"), with = FALSE] + data.table::setnames(nb_annot, "nb_cells", name) + return(nb_annot) } - - - - diff --git a/R/cell_segmentation.R b/R/cell_segmentation.R index 71020dba4..2a55ca149 100644 --- a/R/cell_segmentation.R +++ b/R/cell_segmentation.R @@ -1,4 +1,3 @@ - #' @title doCellSegmentation #' @name doCellSegmentation #' @description segment cells in Dapi image @@ -17,71 +16,69 @@ #' #' @export doCellSegmentation <- function(raster_img, - folder_path, - reduce_resolution = 4, - overlapping_pixels = 50, - python_path = NULL){ - - package_check('deepcell', repository = 'pip') - package_check('PIL', repository = 'pip') - - # python path (requires use of gobject) - # if(is.null(python_path)) { - # python_path = readGiottoInstructions(gobject, param = "python_path") - # } - - # prepare python path and segmentation script - reticulate::use_python(required = T, python = python_path) - python_segmentation_function = system.file("python", - "python_segmentation.py", - package = 'Giotto') - reticulate::source_python(file = python_segmentation_function) - - # create mesmer app - mesmer_app = python_create_mesmer_app() + folder_path, + reduce_resolution = 4, + overlapping_pixels = 50, + python_path = NULL) { + package_check("deepcell", repository = "pip") + package_check("PIL", repository = "pip") + + # prepare python path and segmentation script + reticulate::use_python(required = TRUE, python = python_path) + python_segmentation_function <- system.file("python", + "python_segmentation.py", + package = "Giotto" + ) + reticulate::source_python(file = python_segmentation_function) - # get params for rolling window - img_dim = dim(raster_img) - xdim = img_dim[2] - ydim = img_dim[1] + # create mesmer app + mesmer_app <- python_create_mesmer_app() - mesmer_tile_dim = 512 - tile_dim = mesmer_tile_dim * reduce_resolution - margin = overlapping_pixels - window_size = tile_dim - margin + # get params for rolling window + img_dim <- dim(raster_img) + xdim <- img_dim[2] + ydim <- img_dim[1] - nxwindow = xdim %/% window_size - nywindow = ydim %/% window_size + mesmer_tile_dim <- 512 + tile_dim <- mesmer_tile_dim * reduce_resolution + margin <- overlapping_pixels + window_size <- tile_dim - margin - # sliding window - start_x = 0 - end_x = start_x + tile_dim - for (i in 1:nxwindow) { + nxwindow <- xdim %/% window_size + nywindow <- ydim %/% window_size - start_y = 0 - end_y = start_y + tile_dim - for (j in 1:nywindow) { - ext_crop = terra::ext(c(start_x, end_x, start_y, end_y)) - img_crop = terra::crop(raster_img, ext_crop, snap="in") - img_crop_rescaled = terra::aggregate(img_crop, reduce_resolution) - img_coordinates = as.character(paste('sx', start_x, - 'ex', end_x, - 'sy', start_y, - 'ey', end_y, - sep='_')) - file_name = file.path(folder_path, paste0(img_coordinates, '.png')) - segmentation_result = python_segment_image(mesmer_app, - Matrix::as.matrix(img_crop_rescaled, - wide=TRUE), - file_name) + # sliding window + start_x <- 0 + end_x <- start_x + tile_dim + for (i in 1:nxwindow) { + start_y <- 0 + end_y <- start_y + tile_dim + for (j in 1:nywindow) { + ext_crop <- terra::ext(c(start_x, end_x, start_y, end_y)) + img_crop <- terra::crop(raster_img, ext_crop, snap = "in") + img_crop_rescaled <- terra::aggregate(img_crop, reduce_resolution) + img_coordinates <- as.character(paste("sx", start_x, + "ex", end_x, + "sy", start_y, + "ey", end_y, + sep = "_" + )) + file_name <- file.path(folder_path, paste0(img_coordinates, ".png")) + segmentation_result <- python_segment_image( + mesmer_app, + Matrix::as.matrix(img_crop_rescaled, + wide = TRUE + ), + file_name + ) - start_y = end_y - margin - end_y = start_y + tile_dim + start_y <- end_y - margin + end_y <- start_y + tile_dim + } + start_x <- end_x - margin + end_x <- start_x + tile_dim } - start_x = end_x - margin - end_x = start_x + tile_dim - } - print(segmentation_result) + print(segmentation_result) } diff --git a/R/clustering.R b/R/clustering.R index ba60d34a6..bbf45e0bc 100644 --- a/R/clustering.R +++ b/R/clustering.R @@ -1,30 +1,34 @@ - #' @title doLeidenCluster #' @name doLeidenCluster -#' @description cluster cells using a NN-network and the Leiden community detection algorithm +#' @description cluster cells using a NN-network and the Leiden community +#' detection algorithm #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param name name for cluster, default to "leiden_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" #' @param python_path specify specific path to python if required #' @param resolution resolution, default = 1 #' @param weight_col weight column to use for edges, default to "weight" -#' @param partition_type The type of partition to use for optimisation. (e.g. "RBConfigurationVertexPartition", "ModularityVertexPartition") +#' @param partition_type The type of partition to use for optimization. +#' (e.g. "RBConfigurationVertexPartition", "ModularityVertexPartition") #' @param init_membership initial membership of cells for the partition -#' @param n_iterations number of interations to run the Leiden algorithm. +#' @param n_iterations number of interactions to run the Leiden algorithm. #' If the number of iterations is negative, the Leiden algorithm is run until #' an iteration in which there was no improvement. -#' @param return_gobject boolean: return giotto object (default = TRUE) +#' @param return_gobject Boolean: return giotto object (default = TRUE) #' @param set_seed set seed #' @param seed_number number for seed #' @return giotto object with new clusters appended to cell metadata #' @details #' This function is a wrapper for the Leiden algorithm implemented in python, #' which can detect communities in graphs of millions of nodes (cells), -#' as long as they can fit in memory. See the \url{https://github.com/vtraag/leidenalg}{leidenalg} -#' github page or the \url{https://leidenalg.readthedocs.io/en/stable/index.html}{readthedocs} +#' as long as they can fit in memory. See the +#' \url{https://github.com/vtraag/leidenalg}{leidenalg} +#' github page or the +#' \url{https://leidenalg.readthedocs.io/en/stable/index.html}{readthedocs} #' page for more information. #' #' Partition types available and information: @@ -39,157 +43,166 @@ #' Set \emph{weight_col = NULL} to give equal weight (=1) to each edge. #' #' @export -doLeidenCluster = function(gobject, - spat_unit = NULL, - feat_type = NULL, - name = 'leiden_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - python_path = NULL, - resolution = 1, - weight_col = 'weight', - partition_type = c('RBConfigurationVertexPartition', - 'ModularityVertexPartition'), - init_membership = NULL, - n_iterations = 1000, - return_gobject = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID[[spat_unit]] - - ## select network to use - igraph_object = get_NearestNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - output = 'igraph') - - - - ## select partition type - partition_type = match.arg(partition_type, - choices = c('RBConfigurationVertexPartition', 'ModularityVertexPartition')) - - ## check or make paths - # python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } - - ## prepare python path and louvain script - reticulate::use_python(required = T, python = python_path) - python_leiden_function = system.file("python", "python_leiden.py", package = 'Giotto') - reticulate::source_python(file = python_leiden_function) - - ## set seed - if(isTRUE(set_seed)) { - seed_number = as.integer(seed_number) - } else { - seed_number = as.integer(sample(x = 1:10000, size = 1)) - } - - ## extract NN network - network_edge_dt = data.table::as.data.table(igraph::as_data_frame(x = igraph_object, what = 'edges')) - - # data.table variables - weight = NULL - - # add weight for edges or set to 1 for all - if(!is.null(weight_col)) { - if(!weight_col %in% colnames(network_edge_dt)) { - stop('\n weight column is not an igraph attribute \n') - } else { - # weight is defined by attribute of igraph object - network_edge_dt = network_edge_dt[,c('from', 'to', weight_col), with = F] - data.table::setnames(network_edge_dt, weight_col, 'weight') - } - } else { - # weight is the same - network_edge_dt = network_edge_dt[,c('from', 'to'), with = F] - network_edge_dt[, weight := 1] - } - +doLeidenCluster <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + name = "leiden_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + python_path = NULL, + resolution = 1, + weight_col = "weight", + partition_type = c( + "RBConfigurationVertexPartition", + "ModularityVertexPartition" + ), + init_membership = NULL, + n_iterations = 1000, + return_gobject = TRUE, + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID[[spat_unit]] + ## select network to use + igraph_object <- getNearestNetwork( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + nn_type = nn_network_to_use, + name = network_name, + output = "igraph" + ) - ## do python leiden clustering - reticulate::py_set_seed(seed = seed_number, disable_hash_randomization = TRUE) - pyth_leid_result = python_leiden(df = network_edge_dt, - partition_type = partition_type, - initial_membership = init_membership, - weights = 'weight', - n_iterations = n_iterations, - seed = seed_number, - resolution_parameter = resolution) - ident_clusters_DT = data.table::data.table(cell_ID = pyth_leid_result[[1]], 'name' = pyth_leid_result[[2]]) - data.table::setnames(ident_clusters_DT, 'name', name) + ## select partition type + partition_type <- match.arg(partition_type, + choices = c( + "RBConfigurationVertexPartition", "ModularityVertexPartition") + ) + ## check or make paths + # python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") + } - ## add clusters to metadata ## - if(return_gobject == TRUE) { + ## prepare python path and louvain script + reticulate::use_python(required = TRUE, python = python_path) + python_leiden_function <- system.file("python", "python_leiden.py", + package = "Giotto") + reticulate::source_python(file = python_leiden_function) + ## set seed + if (isTRUE(set_seed)) { + seed_number <- as.integer(seed_number) + } else { + seed_number <- as.integer(sample(x = 1:10000, size = 1)) + } - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - #cluster_names = names(gobject@cell_metadata[[spat_unit]][[feat_type]]) + ## extract NN network + network_edge_dt <- data.table::as.data.table( + igraph::as_data_frame(x = igraph_object, what = "edges")) + + # data.table variables + weight <- NULL + + # add weight for edges or set to 1 for all + if (!is.null(weight_col)) { + if (!weight_col %in% colnames(network_edge_dt)) { + stop("weight column is not an igraph attribute") + } else { + # weight is defined by attribute of igraph object + network_edge_dt <- network_edge_dt[ + , c("from", "to", weight_col), with = FALSE] + data.table::setnames(network_edge_dt, weight_col, "weight") + } + } else { + # weight is the same + network_edge_dt <- network_edge_dt[, c("from", "to"), with = FALSE] + network_edge_dt[, weight := 1] + } - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = getCellMetadata( - gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE - ) - cell_metadata[][, eval(name) := NULL] - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject <- setCellMetadata( - gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE - ) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } - gobject = addCellMetadata( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' + ## do python leiden clustering + reticulate::py_set_seed(seed = seed_number, + disable_hash_randomization = TRUE) + pyth_leid_result <- python_leiden( + df = network_edge_dt, + partition_type = partition_type, + initial_membership = init_membership, + weights = "weight", + n_iterations = n_iterations, + seed = seed_number, + resolution_parameter = resolution ) - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_cluster') - return(gobject) + ident_clusters_DT <- data.table::data.table( + cell_ID = pyth_leid_result[[1]], "name" = pyth_leid_result[[2]]) + data.table::setnames(ident_clusters_DT, "name", name) - } else { - # else return clustering result - return(ident_clusters_DT) - } + ## add clusters to metadata ## + if (return_gobject == TRUE) { + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata( + gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_cluster") + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } @@ -206,7 +219,8 @@ doLeidenCluster = function(gobject, #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param name name for cluster, default to "leiden_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" #' @param objective_function objective function for the leiden algo #' @param weights weights of edges @@ -217,131 +231,137 @@ doLeidenCluster = function(gobject, #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param set_seed set seed #' @param seed_number number for seed -#' @inheritDotParams igraph::cluster_leiden -graph -objective_function -resolution_parameter -beta -weights -initial_membership -n_iterations +#' @inheritDotParams igraph::cluster_leiden -graph -objective_function +#' -resolution_parameter -beta -weights -initial_membership -n_iterations #' @return giotto object with new clusters appended to cell metadata #' @details #' This function is a wrapper for the Leiden algorithm implemented in igraph, #' which can detect communities in graphs of millions of nodes (cells), -#' as long as they can fit in memory. See \code{\link[igraph]{cluster_leiden}} for more information. +#' as long as they can fit in memory. See \code{\link[igraph]{cluster_leiden}} +#' for more information. #' #' Set \emph{weights = NULL} to use the vertices weights associated with the igraph network. #' Set \emph{weights = NA} if you don't want to use vertices weights #' #' @export -doLeidenClusterIgraph = function(gobject, - spat_unit = NULL, - feat_type = NULL, - name = 'leiden_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - objective_function = c("modularity", "CPM"), - weights = NULL, - resolution_parameter = 1, - beta = 0.01, - initial_membership = NULL, - n_iterations = 1000, - return_gobject = TRUE, - set_seed = TRUE, - seed_number = 1234, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID[[spat_unit]] - - ## select network to use - igraph_object = get_NearestNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - output = 'igraph') - - ## select partition type - objective_function = match.arg(objective_function, - choices = c("modularity", "CPM")) - - ## set seed - if(isTRUE(set_seed)) { - seed_number = as.integer(seed_number) - set.seed(seed_number) - on.exit(expr = {GiottoUtils::random_seed(set.seed = TRUE)}, add = TRUE) - } - - # make igraph network undirected - graph_object_undirected = igraph::as.undirected(igraph_object) - leiden_clusters = igraph::cluster_leiden( - graph = graph_object_undirected, - objective_function = objective_function, - resolution_parameter = resolution_parameter, - beta = beta, - weights = weights, - initial_membership = initial_membership, - n_iterations = n_iterations, - ... - ) - - # summarize results - ident_clusters_DT = data.table::data.table('cell_ID' = leiden_clusters$names, 'name' = leiden_clusters$membership) - data.table::setnames(ident_clusters_DT, 'name', name) - - - - ## add clusters to metadata ## - if(isTRUE(return_gobject)) { - - - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - #cluster_names = names(gobject@cell_metadata[[spat_unit]][[feat_type]]) - - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_metadata[][, eval(name) := NULL] - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = setCellMetadata(gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } +doLeidenClusterIgraph <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + name = "leiden_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + objective_function = c("modularity", "CPM"), + weights = NULL, + resolution_parameter = 1, + beta = 0.01, + initial_membership = NULL, + n_iterations = 1000, + return_gobject = TRUE, + set_seed = TRUE, + seed_number = 1234, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID[[spat_unit]] - gobject = addCellMetadata( + ## select network to use + igraph_object <- getNearestNetwork( gobject = gobject, spat_unit = spat_unit, feat_type = feat_type, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' + nn_type = nn_network_to_use, + name = network_name, + output = "igraph" + ) + + ## select partition type + objective_function <- match.arg(objective_function, + choices = c("modularity", "CPM") ) - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_cluster') - return(gobject) + ## set seed + if (isTRUE(set_seed)) { + seed_number <- as.integer(seed_number) + set.seed(seed_number) + on.exit(expr = { + GiottoUtils::random_seed(set.seed = TRUE) + }, add = TRUE) + } + + # make igraph network undirected + graph_object_undirected <- igraph::as.undirected(igraph_object) + leiden_clusters <- igraph::cluster_leiden( + graph = graph_object_undirected, + objective_function = objective_function, + resolution_parameter = resolution_parameter, + beta = beta, + weights = weights, + initial_membership = initial_membership, + n_iterations = n_iterations, + ... + ) + # summarize results + ident_clusters_DT <- data.table::data.table( + "cell_ID" = leiden_clusters$names, "name" = leiden_clusters$membership) + data.table::setnames(ident_clusters_DT, "name", name) - } else { - # else return clustering result - return(ident_clusters_DT) - } + ## add clusters to metadata ## + if (isTRUE(return_gobject)) { + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata(gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_cluster") + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } @@ -352,102 +372,112 @@ doLeidenClusterIgraph = function(gobject, #' @title doGiottoClustree #' @name doGiottoClustree -#' @description cluster cells using leiden methodology to visualize different resolutions +#' @description cluster cells using leiden methodology to visualize different +#' resolutions #' @param gobject giotto object #' @param res_vector vector of different resolutions to test -#' @param res_seq list of float numbers indicating start, end, and step size for resolution testing, i.e. (0.1, 0.6, 0.1) +#' @param res_seq list of float numbers indicating start, end, and step size +#' for resolution testing, i.e. (0.1, 0.6, 0.1) #' @param return_gobject default FALSE. See details for more info. #' @param show_plot by default, pulls from provided gobject instructions #' @param save_plot by default, pulls from provided gobject instructions #' @param return_plot by default, pulls from provided gobject instructions -#' @param save_param list of saving parameters from [GiottoVisuals::all_plots_save_function()] +#' @param save_param list of saving parameters from +#' [GiottoVisuals::all_plots_save_function()] #' @param default_save_name name of saved plot, default "clustree" #' @param verbose be verbose #' @inheritDotParams clustree::clustree -x #' @return a plot object (default), OR a giotto object (if specified) -#' @details This function tests different resolutions for Leiden clustering and provides a visualization -#' of cluster sizing as resolution varies. +#' @details This function tests different resolutions for Leiden clustering and +#' provides a visualization of cluster sizing as resolution varies. #' -#' By default, the tested leiden clusters are NOT saved to the Giotto object, and a plot is returned. +#' By default, the tested leiden clusters are NOT saved to the Giotto object, +#' and a plot is returned. #' -#' If return_gobject is set to TRUE, and a giotto object with *all* tested leiden cluster information +#' If return_gobject is set to TRUE, and a giotto object with *all* tested +#' leiden cluster information #' will be returned. #' @seealso \code{\link{doLeidenCluster}} #' @md #' @export doGiottoClustree <- function(gobject, - res_vector = NULL, - res_seq = NULL, - return_gobject = FALSE, - show_plot = NULL, - save_plot = NULL, - return_plot = NULL, - save_param = list(), - default_save_name = "clustree", - verbose = TRUE, - ...) { - - package_check(pkg_name = "clustree", repository = "CRAN") - ## setting resolutions to use - if(is.null(res_vector)){ - if(!is.null(res_seq)){ - res_vector = seq(res_seq[1], res_seq[2], res_seq[3]) - } else stop("Please input res_vector or res_seq parameters") - } - - ## performing multiple leiden clusters at resolutions specified - for (i in res_vector){ - if (isTRUE(verbose)) wrap_msg('Calculating leiden res:', i) - gobject = doLeidenCluster( - gobject = gobject, - resolution = i, - name = paste0("leiden_clustree_", i) + res_vector = NULL, + res_seq = NULL, + return_gobject = FALSE, + show_plot = NULL, + save_plot = NULL, + return_plot = NULL, + save_param = list(), + default_save_name = "clustree", + verbose = TRUE, + ...) { + package_check(pkg_name = "clustree", repository = "CRAN") + ## setting resolutions to use + if (is.null(res_vector)) { + if (!is.null(res_seq)) { + res_vector <- seq(res_seq[1], res_seq[2], res_seq[3]) + } else { + stop("Please input res_vector or res_seq parameters") + } + } + + ## performing multiple leiden clusters at resolutions specified + for (i in res_vector) { + if (isTRUE(verbose)) wrap_msg("Calculating leiden res:", i) + gobject <- doLeidenCluster( + gobject = gobject, + resolution = i, + name = paste0("leiden_clustree_", i) + ) + } + + ## plotting clustree graph + pl <- clustree::clustree( + x = pDataDT(gobject), + prefix = "leiden_clustree_", + ... ) - } - - ## plotting clustree graph - pl = clustree::clustree( - x = pDataDT(gobject), - prefix = "leiden_clustree_", - ... - ) - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title doLouvainCluster community #' @name .doLouvainCluster_community -#' @description cluster cells using a NN-network and the Louvain algorithm from the community module in Python +#' @description cluster cells using a NN-network and the Louvain algorithm +#' from the community module in Python #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param name name for cluster, default to "louvain_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" #' @param python_path specify specific path to python if required #' @param resolution resolution, default = 1 #' @param weight_col weight column to use for edges -#' @param louv_random Will randomize the node evaluation order and the community evaluation -#' order to get different partitions at each call (default = FALSE) +#' @param louv_random Will randomize the node evaluation order and the +#' community evaluation order to get different partitions at each call +#' (default = FALSE) #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param set_seed set seed (default = FALSE) #' @param seed_number number for seed #' @param \dots additional params to pass #' @return giotto object with new clusters appended to cell metadata -#' @details This function is a wrapper for the Louvain algorithm implemented in Python, -#' which can detect communities in graphs of nodes (cells). -#' See the \url{https://python-louvain.readthedocs.io/en/latest/index.html}{readthedocs} +#' @details This function is a wrapper for the Louvain algorithm implemented in +#' Python, which can detect communities in graphs of nodes (cells). +#' See the +#' \url{https://python-louvain.readthedocs.io/en/latest/index.html}{readthedocs} #' page for more information. #' #' Set \emph{weight_col = NULL} to give equal weight (=1) to each edge. @@ -455,299 +485,329 @@ doGiottoClustree <- function(gobject, #' @keywords internal #' .doLouvainCluster_community <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - name = 'louvain_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - python_path = NULL, - resolution = 1, - weight_col = NULL, - louv_random = FALSE, - return_gobject = TRUE, - set_seed = FALSE, - seed_number = 1234, - ...) { - - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID[[spat_unit]] - - ## select network to use - igraph_object = get_NearestNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - output = 'igraph') - - ## check or make paths - # python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } - - # prepare python path and louvain script - reticulate::use_python(required = TRUE, python = python_path) - python_louvain_function = system.file("python", "python_louvain.py", package = 'Giotto') - reticulate::source_python(file = python_louvain_function) - - # set seed - if(isTRUE(set_seed)) { - seed_number = as.integer(seed_number) - } else { - seed_number = as.integer(sample(x = 1:10000, size = 1)) - } - - network_edge_dt = data.table::as.data.table(igraph::as_data_frame(x = igraph_object, what = 'edges')) - - # data.table variables - weight = NULL - - if(!is.null(weight_col)) { - - if(!weight_col %in% colnames(network_edge_dt)) { - stop('\n weight column is not an igraph attribute \n') - } else { - # weight is defined by attribute of igraph object - network_edge_dt = network_edge_dt[,c('from', 'to', weight_col), with = FALSE] - setnames(network_edge_dt, weight_col, 'weight') - } - } else { - # weight is the same - network_edge_dt = network_edge_dt[,c('from', 'to'), with = FALSE] - network_edge_dt[, weight := 1] - } - - # do python louvain clustering - if(louv_random == FALSE) { - reticulate::py_set_seed(seed = seed_number, disable_hash_randomization = TRUE) - pyth_louv_result = python_louvain(df = network_edge_dt, resolution = resolution, randomize = FALSE) - } else { - reticulate::py_set_seed(seed = seed_number, disable_hash_randomization = TRUE) - pyth_louv_result = python_louvain(df = network_edge_dt, resolution = resolution, random_state = seed_number) - } - ident_clusters_DT = data.table::data.table(cell_ID = rownames(pyth_louv_result), 'name' = pyth_louv_result[[1]]) - data.table::setnames(ident_clusters_DT, 'name', name) - - - ## return - if(isTRUE(return_gobject)) { - - # get cell metadata names - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - - - # set name/cluster results to NULL if already exist - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_metadata[][, eval(name) := NULL] - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = setCellMetadata(gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + spat_unit = NULL, + feat_type = NULL, + name = "louvain_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + python_path = NULL, + resolution = 1, + weight_col = NULL, + louv_random = FALSE, + return_gobject = TRUE, + set_seed = FALSE, + seed_number = 1234, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID[[spat_unit]] + + ## select network to use + igraph_object <- getNearestNetwork( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + nn_type = nn_network_to_use, + name = network_name, + output = "igraph" + ) + + ## check or make paths + # python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") } + # prepare python path and louvain script + reticulate::use_python(required = TRUE, python = python_path) + python_louvain_function <- system.file( + "python", "python_louvain.py", package = "Giotto") + reticulate::source_python(file = python_louvain_function) - # add new metadata information - gobject = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, column_cell_ID = 'cell_ID') + # set seed + if (isTRUE(set_seed)) { + seed_number <- as.integer(seed_number) + } else { + seed_number <- as.integer(sample(x = 1:10000, size = 1)) + } - ## update parameters used ## + network_edge_dt <- data.table::as.data.table(igraph::as_data_frame( + x = igraph_object, what = "edges")) - # 1. get parent function name - cl = sys.call(-1) + # data.table variables + weight <- NULL - # 2. check if this function call is within doLouvainCluster - if(is.null(cl)) { - gobject = update_giotto_params(gobject, description = '_cluster') + if (!is.null(weight_col)) { + if (!weight_col %in% colnames(network_edge_dt)) { + stop("weight column is not an igraph attribute") + } else { + # weight is defined by attribute of igraph object + network_edge_dt <- network_edge_dt[ + , c("from", "to", weight_col), with = FALSE] + setnames(network_edge_dt, weight_col, "weight") + } } else { - fname = as.character(cl[[1]]) - if(fname == 'doLouvainCluster') { - gobject = update_giotto_params(gobject, description = '_cluster', toplevel = 3) - } else { - gobject = update_giotto_params(gobject, description = '_cluster') - } + # weight is the same + network_edge_dt <- network_edge_dt[, c("from", "to"), with = FALSE] + network_edge_dt[, weight := 1] } - return(gobject) + # do python louvain clustering + if (louv_random == FALSE) { + reticulate::py_set_seed( + seed = seed_number, disable_hash_randomization = TRUE) + pyth_louv_result <- python_louvain( + df = network_edge_dt, resolution = resolution, randomize = FALSE) + } else { + reticulate::py_set_seed( + seed = seed_number, disable_hash_randomization = TRUE) + pyth_louv_result <- python_louvain( + df = network_edge_dt, + resolution = resolution, + random_state = seed_number) + } + ident_clusters_DT <- data.table::data.table( + cell_ID = rownames(pyth_louv_result), "name" = pyth_louv_result[[1]]) + data.table::setnames(ident_clusters_DT, "name", name) - } else { + ## return + if (isTRUE(return_gobject)) { + # get cell metadata names + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + + # set name/cluster results to NULL if already exist + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata(gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } - # else return clustering result - return(ident_clusters_DT) - } + # add new metadata information + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, column_cell_ID = "cell_ID" + ) + ## update parameters used ## + + # 1. get parent function name + cl <- sys.call(-1) + + # 2. check if this function call is within doLouvainCluster + if (is.null(cl)) { + gobject <- update_giotto_params(gobject, description = "_cluster") + } else { + fname <- as.character(cl[[1]]) + if (fname == "doLouvainCluster") { + gobject <- update_giotto_params(gobject, + description = "_cluster", + toplevel = 3) + } else { + gobject <- update_giotto_params(gobject, + description = "_cluster") + } + } + + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } #' @title doLouvainCluster multinet #' @name .doLouvainCluster_multinet -#' @description cluster cells using a NN-network and the Louvain algorithm from the multinet package in R. +#' @description cluster cells using a NN-network and the Louvain algorithm from +#' the multinet package in R. #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param name name for cluster, default to "louvain_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" -#' @param gamma Resolution parameter for modularity in the generalized louvain method. default = 1 -#' @param omega Inter-layer weight parameter in the generalized louvain method. default = 1 +#' @param gamma Resolution parameter for modularity in the generalized louvain +#' method. default = 1 +#' @param omega Inter-layer weight parameter in the generalized louvain method. +#' default = 1 #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param set_seed set seed (default = FALSE) #' @param seed_number number for seed #' @return giotto object with new clusters appended to cell metadata -#' @details See \code{\link[multinet]{glouvain_ml}} from the multinet package in R for -#' more information. +#' @details See \code{\link[multinet]{glouvain_ml}} from the multinet package +#' in R for more information. #' #' @keywords internal .doLouvainCluster_multinet <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - name = 'louvain_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - gamma = 1, - omega = 1, - return_gobject = TRUE, - set_seed = FALSE, - seed_number = 1234) { - - - if("multinet" %in% rownames(installed.packages()) == FALSE) { - stop("\n package 'multinet' is not yet installed \n", - "To install: \n", - "install.packages('multinet')" - ) - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID[[spat_unit]] - - ## select network to use - igraph_object = get_NearestNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - output = 'igraph') - - # create mlnetworkobject - mln_object <- multinet::ml_empty() - multinet::add_vertices_ml(n = mln_object, vertices = igraph::V(igraph_object)) - multinet::add_igraph_layer_ml(n = mln_object, g = igraph_object, name = name) - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = as.integer(seed_number)) - } - - # data.table variables - cell_ID = actor = weight_col = NULL - - louvain_clusters = multinet::glouvain_ml(n = mln_object, gamma = gamma, omega = omega) - ident_clusters_DT = data.table::as.data.table(louvain_clusters) - ident_clusters_DT[, cell_ID := actor] - data.table::setnames(ident_clusters_DT, 'cid', name) - - # exit seed - if(isTRUE(set_seed)) { - set.seed(Sys.time()) - } - - - ## return - if(isTRUE(return_gobject)) { - - # get cell metadata names - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - - - # set name/cluster results to NULL if already exist - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_metadata[][, eval(name) := NULL] - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = setCellMetadata(gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + spat_unit = NULL, + feat_type = NULL, + name = "louvain_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + gamma = 1, + omega = 1, + return_gobject = TRUE, + set_seed = FALSE, + seed_number = 1234) { + if ("multinet" %in% rownames(installed.packages()) == FALSE) { + stop( + "package 'multinet' is not yet installed \n", + "To install: \n", + "install.packages('multinet')" + ) } + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # add new metadata information - gobject = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, column_cell_ID = 'cell_ID') + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID[[spat_unit]] - ## update parameters used ## - # 1. get parent function name - cl = sys.call(-1) + ## select network to use + igraph_object <- getNearestNetwork( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + nn_type = nn_network_to_use, + name = network_name, + output = "igraph" + ) - # 2. check if this function call is within doLouvainCluster - if(is.null(cl)) { - gobject = update_giotto_params(gobject, description = '_cluster') - } else { - fname = as.character(cl[[1]]) - if(fname == 'doLouvainCluster') { - gobject = update_giotto_params(gobject, description = '_cluster', toplevel = 3) - } else { - gobject = update_giotto_params(gobject, description = '_cluster') - } + # create mlnetworkobject + mln_object <- multinet::ml_empty() + multinet::add_vertices_ml( + n = mln_object, vertices = igraph::V(igraph_object)) + multinet::add_igraph_layer_ml( + n = mln_object, g = igraph_object, name = name) + + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = as.integer(seed_number)) } - return(gobject) + # data.table variables + cell_ID <- actor <- weight_col <- NULL - } else { + louvain_clusters <- multinet::glouvain_ml( + n = mln_object, gamma = gamma, omega = omega) + ident_clusters_DT <- data.table::as.data.table(louvain_clusters) + ident_clusters_DT[, cell_ID := actor] + data.table::setnames(ident_clusters_DT, "cid", name) + + # exit seed + if (isTRUE(set_seed)) { + set.seed(Sys.time()) + } + + + ## return + if (isTRUE(return_gobject)) { + # get cell metadata names + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + + # set name/cluster results to NULL if already exist + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata(gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } - # else return clustering result - return(ident_clusters_DT) - } + # add new metadata information + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, column_cell_ID = "cell_ID" + ) + ## update parameters used ## + # 1. get parent function name + cl <- sys.call(-1) + + # 2. check if this function call is within doLouvainCluster + if (is.null(cl)) { + gobject <- update_giotto_params(gobject, description = "_cluster") + } else { + fname <- as.character(cl[[1]]) + if (fname == "doLouvainCluster") { + gobject <- update_giotto_params(gobject, + description = "_cluster", + toplevel = 3) + } else { + gobject <- update_giotto_params(gobject, + description = "_cluster") + } + } + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } @@ -760,95 +820,103 @@ doGiottoClustree <- function(gobject, #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param version implemented version of Louvain clustering to use #' @param name name for cluster, default to "louvain_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" #' @param python_path [community] specify specific path to python if required #' @param resolution [community] resolution, default = 1 -#' @param louv_random [community] Will randomize the node evaluation order and the community evaluation -#' order to get different partitions at each call (default = FALSE) +#' @param louv_random [community] Will randomize the node evaluation order and +#' the community evaluation order to get different partitions at each call +#' (default = FALSE) #' @param weight_col weight column name -#' @param gamma [multinet] Resolution parameter for modularity in the generalized louvain method, default = 1 -#' @param omega [multinet] Inter-layer weight parameter in the generalized louvain method, default = 1 +#' @param gamma [multinet] Resolution parameter for modularity in the +#' generalized louvain method, default = 1 +#' @param omega [multinet] Inter-layer weight parameter in the generalized +#' louvain method, default = 1 #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param set_seed set seed (default = FALSE) -#' @param ... arguments passed to \code{\link{.doLouvainCluster_community}} or \code{\link{.doLouvainCluster_multinet}} +#' @param ... arguments passed to \code{\link{.doLouvainCluster_community}} or +#' \code{\link{.doLouvainCluster_multinet}} #' @param seed_number number for seed #' #' @return giotto object with new clusters appended to cell metadata -#' @details Louvain clustering using the community or multinet implementation of the louvain clustering algorithm. -#' @seealso \code{\link{.doLouvainCluster_community}} and \code{\link{.doLouvainCluster_multinet}} +#' @details Louvain clustering using the community or multinet implementation +#' of the louvain clustering algorithm. +#' @seealso \code{\link{.doLouvainCluster_community}} and +#' \code{\link{.doLouvainCluster_multinet}} #' @export -doLouvainCluster = function(gobject, - spat_unit = NULL, - feat_type = NULL, - version = c('community', 'multinet'), - name = 'louvain_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - python_path = NULL, - resolution = 1, - weight_col = NULL, - gamma = 1, - omega = 1, - louv_random = FALSE, - return_gobject = TRUE, - set_seed = FALSE, - seed_number = 1234, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - - ## louvain clustering version to use - version = match.arg(version, c('community', 'multinet')) - - # python community implementation - if(version == 'community') { - - result = .doLouvainCluster_community(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - python_path = python_path, - resolution = resolution, - weight_col = weight_col, - louv_random = louv_random, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number, - ...) +doLouvainCluster <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + version = c("community", "multinet"), + name = "louvain_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + python_path = NULL, + resolution = 1, + weight_col = NULL, + gamma = 1, + omega = 1, + louv_random = FALSE, + return_gobject = TRUE, + set_seed = FALSE, + seed_number = 1234, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - return(result) - ## r multinet implementation - } else if(version == 'multinet') { - - result = .doLouvainCluster_multinet(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - gamma = gamma, - omega = omega, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number, - ...) - return(result) - } + ## louvain clustering version to use + version <- match.arg(version, c("community", "multinet")) + # python community implementation + if (version == "community") { + result <- .doLouvainCluster_community( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + python_path = python_path, + resolution = resolution, + weight_col = weight_col, + louv_random = louv_random, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number, + ... + ) + + return(result) + + ## r multinet implementation + } else if (version == "multinet") { + result <- .doLouvainCluster_multinet( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + gamma = gamma, + omega = omega, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + return(result) + } } @@ -858,7 +926,8 @@ doLouvainCluster = function(gobject, #' @description Cluster cells using a random walk approach. #' @param gobject giotto object #' @param name name for cluster, default to "random_walk_clus" -#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to "sNN" +#' @param nn_network_to_use type of NN network to use (kNN vs sNN), default to +#' "sNN" #' @param network_name name of NN network to use, default to "sNN.pca" #' @param walk_steps number of walking steps, default = 4 #' @param walk_clusters number of final clusters, default = 10 @@ -871,67 +940,74 @@ doLouvainCluster = function(gobject, #' package in R for more information. #' @export doRandomWalkCluster <- function(gobject, - name = 'random_walk_clus', - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - walk_steps = 4, - walk_clusters = 10, - walk_weights = NA, - return_gobject = TRUE, - set_seed = FALSE, - seed_number = 1234) { - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID - - ## select network to use - igraph_object = get_NearestNetwork(gobject, - nn_network_to_use = nn_network_to_use, - network_name = network_name) - - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - } - - randomwalk_clusters <- igraph::cluster_walktrap(graph = igraph_object, steps = walk_steps, weights = walk_weights) - randomwalk_clusters <- as.factor(igraph::cut_at(communities = randomwalk_clusters, no = walk_clusters)) - - ident_clusters_DT <- data.table::data.table('cell_ID' = igraph::V(igraph_object)$name, 'name' = randomwalk_clusters) - data.table::setnames(ident_clusters_DT, 'name', name) - - # exit seed - if(isTRUE(set_seed)) { - set.seed(Sys.time()) - } - - - ## return - if(return_gobject == TRUE) { - - cluster_names = names(gobject@cell_metadata) - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata - cell_metadata[, eval(name) := NULL] - gobject@cell_metadata = cell_metadata + name = "random_walk_clus", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + walk_steps = 4, + walk_clusters = 10, + walk_weights = NA, + return_gobject = TRUE, + set_seed = FALSE, + seed_number = 1234) { + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID + + ## select network to use + igraph_object <- getNearestNetwork( + gobject, + nn_type = nn_network_to_use, + name = network_name + ) + + + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) } - gobject = addCellMetadata(gobject = gobject, new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, column_cell_ID = 'cell_ID') + randomwalk_clusters <- igraph::cluster_walktrap( + graph = igraph_object, steps = walk_steps, weights = walk_weights) + randomwalk_clusters <- as.factor(igraph::cut_at( + communities = randomwalk_clusters, no = walk_clusters)) + ident_clusters_DT <- data.table::data.table( + "cell_ID" = igraph::V(igraph_object)$name, + "name" = randomwalk_clusters) + data.table::setnames(ident_clusters_DT, "name", name) + + # exit seed + if (isTRUE(set_seed)) { + set.seed(Sys.time()) + } - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_randomwalk_cluster') - return(gobject) - } else { + ## return + if (return_gobject == TRUE) { + cluster_names <- names(gobject@cell_metadata) + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata + cell_metadata[, eval(name) := NULL] + gobject@cell_metadata <- cell_metadata + } - # else return clustering result - return(ident_clusters_DT) + gobject <- addCellMetadata( + gobject = gobject, + new_metadata = ident_clusters_DT[, c("cell_ID", name), + with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) - } + + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_randomwalk_cluster") + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } @@ -940,12 +1016,17 @@ doRandomWalkCluster <- function(gobject, #' @description Cluster cells using a SNN cluster approach. #' @param gobject giotto object #' @param name name for cluster, default to "sNN_clus" -#' @param nn_network_to_use type of NN network to use (only works on kNN), default to "kNN" +#' @param nn_network_to_use type of NN network to use (only works on kNN), +#' default to "kNN" #' @param network_name name of kNN network to use, default to "kNN.pca" -#' @param k Neighborhood size for nearest neighbor sparsification to create the shared NN graph, default = 20 -#' @param eps Two objects are only reachable from each other if they share at least eps nearest neighbors, default = 4 -#' @param minPts minimum number of points that share at least eps nearest neighbors for a point to be considered a core points, default = 16 -#' @param borderPoints should borderPoints be assigned to clusters like in DBSCAN? (default = TRUE) +#' @param k Neighborhood size for nearest neighbor sparsification to create +#' the shared NN graph, default = 20 +#' @param eps Two objects are only reachable from each other if they share at +#' least eps nearest neighbors, default = 4 +#' @param minPts minimum number of points that share at least eps nearest +#' neighbors for a point to be considered a core points, default = 16 +#' @param borderPoints should borderPoints be assigned to clusters like in +#' DBSCAN? (default = TRUE) #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param set_seed set seed (default = FALSE) #' @param seed_number number for seed @@ -953,91 +1034,98 @@ doRandomWalkCluster <- function(gobject, #' @details See \code{\link[dbscan]{sNNclust}} from dbscan package #' @export doSNNCluster <- function(gobject, - name = 'sNN_clus', - nn_network_to_use = 'kNN', - network_name = 'kNN.pca', - k = 20, - eps = 4, - minPts = 16, - borderPoints = TRUE, - return_gobject = TRUE, - set_seed = FALSE, - seed_number = 1234) { - - - ## get cell IDs ## - cell_ID_vec = gobject@cell_ID - - ## select network to use - igraph_object = get_NearestNetwork(gobject, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - output = 'igraph') - - - if(nn_network_to_use == 'sNN') { - stop('\n sNNclust can only be used with kNN-network \n') - } - - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - } - - # data.table variables - from = from_T = to_T = to = weight = distance = NULL - - ## SNN clust - igraph_DT = data.table::as.data.table(igraph::as_data_frame(igraph_object, what = 'edges')) - igraph_DT = igraph_DT[order(from)] - - cell_id_numeric = unique(x = c(igraph_DT$from, igraph_DT$to)) - names(cell_id_numeric) <- seq_along(cell_id_numeric) - igraph_DT[, from_T := as.numeric(names(cell_id_numeric[cell_id_numeric == from])), by = 1:nrow(igraph_DT)] - igraph_DT[, to_T := as.numeric(names(cell_id_numeric[cell_id_numeric == to])), by = 1:nrow(igraph_DT)] - temp_igraph_DT = igraph_DT[,.(from_T, to_T, weight, distance)] - data.table::setnames(temp_igraph_DT, old = c('from_T', 'to_T'), new = c('from', 'to')) - - kNN_object = nnDT_to_kNN(nnDT = temp_igraph_DT) - sNN_clusters = dbscan::sNNclust(x = kNN_object, k = k, eps = eps, - minPts = minPts, borderPoints = borderPoints) - - ident_clusters_DT <- data.table::data.table('cell_ID' = cell_id_numeric[1:nrow(kNN_object$dist)], 'name' = sNN_clusters$cluster) - data.table::setnames(ident_clusters_DT, 'name', name) - - # exit seed - if(isTRUE(set_seed)) { - set.seed(Sys.time()) - } - - ## add clusters to metadata ## - if(return_gobject == TRUE) { - - cluster_names = names(gobject@cell_metadata) - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata - cell_metadata[, eval(name) := NULL] - gobject@cell_metadata = cell_metadata + name = "sNN_clus", + nn_network_to_use = "kNN", + network_name = "kNN.pca", + k = 20, + eps = 4, + minPts = 16, + borderPoints = TRUE, + return_gobject = TRUE, + set_seed = FALSE, + seed_number = 1234) { + ## get cell IDs ## + cell_ID_vec <- gobject@cell_ID + + ## select network to use + igraph_object <- getNearestNetwork( + gobject, + nn_type = nn_network_to_use, + name = network_name, + output = "igraph" + ) + + + if (nn_network_to_use == "sNN") { + stop("sNNclust can only be used with kNN-network") } - gobject = addCellMetadata(gobject = gobject, new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, column_cell_ID = 'cell_ID') + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_SNN_cluster') - return(gobject) + # data.table variables + from <- from_T <- to_T <- to <- weight <- distance <- NULL + + ## SNN clust + igraph_DT <- data.table::as.data.table(igraph::as_data_frame( + igraph_object, what = "edges")) + igraph_DT <- igraph_DT[order(from)] + + cell_id_numeric <- unique(x = c(igraph_DT$from, igraph_DT$to)) + names(cell_id_numeric) <- seq_along(cell_id_numeric) + igraph_DT[, from_T := as.numeric(names(cell_id_numeric[ + cell_id_numeric == from])), by = 1:nrow(igraph_DT)] + igraph_DT[, to_T := as.numeric(names(cell_id_numeric[ + cell_id_numeric == to])), by = 1:nrow(igraph_DT)] + temp_igraph_DT <- igraph_DT[, .(from_T, to_T, weight, distance)] + data.table::setnames( + temp_igraph_DT, old = c("from_T", "to_T"), new = c("from", "to")) + + kNN_object <- nnDT_to_kNN(nnDT = temp_igraph_DT) + sNN_clusters <- dbscan::sNNclust( + x = kNN_object, k = k, eps = eps, + minPts = minPts, borderPoints = borderPoints + ) + + ident_clusters_DT <- data.table::data.table( + "cell_ID" = cell_id_numeric[1:nrow(kNN_object$dist)], + "name" = sNN_clusters$cluster) + data.table::setnames(ident_clusters_DT, "name", name) + # exit seed + if (isTRUE(set_seed)) { + set.seed(Sys.time()) + } - } else { + ## add clusters to metadata ## + if (return_gobject == TRUE) { + cluster_names <- names(gobject@cell_metadata) + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata + cell_metadata[, eval(name) := NULL] + gobject@cell_metadata <- cell_metadata + } - # else return clustering result - return(ident_clusters_DT) + gobject <- addCellMetadata( + gobject = gobject, + new_metadata = ident_clusters_DT[, c("cell_ID", name), + with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) - } + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_SNN_cluster") + return(gobject) + } else { + # else return clustering result + return(ident_clusters_DT) + } } @@ -1050,12 +1138,16 @@ doSNNCluster <- function(gobject, #' @param gobject giotto object #' @param feat_type feature type (e.g. "cell") #' @param spat_unit spatial unit (e.g. "rna", "dna", "protein") -#' @param expression_values expression values to use (e.g. "normalized", "scaled", "custom") +#' @param expression_values expression values to use +#' (e.g. "normalized", "scaled", "custom") #' @param feats_to_use subset of features to use -#' @param dim_reduction_to_use dimension reduction to use (e.g. "cells", "pca", "umap", "tsne") +#' @param dim_reduction_to_use dimension reduction to use +#' (e.g. "cells", "pca", "umap", "tsne") #' @param dim_reduction_name dimensions reduction name, default to "pca" #' @param dimensions_to_use dimensions to use, default = 1:10 -#' @param distance_method distance method (e.g. "original", "pearson", "spearman", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski") +#' @param distance_method distance method (e.g. "original", "pearson", +#' "spearman", "euclidean", "maximum", "manhattan", "canberra", "binary", +#' "minkowski") #' @param centers number of final clusters, default = 10 #' @param iter_max kmeans maximum iterations, default = 100 #' @param nstart kmeans nstart, default = 1000 @@ -1069,157 +1161,178 @@ doSNNCluster <- function(gobject, #' @seealso \code{\link[stats]{kmeans}} #' @export doKmeans <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - feats_to_use = NULL, - dim_reduction_to_use = c('cells', 'pca', 'umap', 'tsne'), - dim_reduction_name = 'pca', - dimensions_to_use = 1:10, - distance_method = c("original", "pearson", "spearman", - "euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski"), - centers = 10, - iter_max = 100, - nstart = 1000, - algorithm = "Hartigan-Wong", - name = 'kmeans', - return_gobject = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - dim_reduction_to_use = match.arg(dim_reduction_to_use, choices = c('cells', 'pca', 'umap', 'tsne')) - distance_method = match.arg(distance_method, choices = c("original", "pearson", "spearman", - "euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski")) - - - ## using dimension reduction ## - if(dim_reduction_to_use != 'cells' & !is.null(dim_reduction_to_use)) { - - ## TODO: check if reduction exists - - # use only available dimensions if dimensions < dimensions_to_use - dim_coord = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = 'cells', - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'dimObj') - - dimensions_to_use = dimensions_to_use[dimensions_to_use %in% 1:ncol(dim_coord[])] - matrix_to_use = dim_coord[][, dimensions_to_use] - - - } else { - - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - ## using original matrix ## - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - # subset expression matrix - if(!is.null(feats_to_use)) { - expr_values[] = expr_values[][rownames(expr_values[]) %in% feats_to_use, ] + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + feats_to_use = NULL, + dim_reduction_to_use = c("cells", "pca", "umap", "tsne"), + dim_reduction_name = "pca", + dimensions_to_use = 1:10, + distance_method = c( + "original", "pearson", "spearman", + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + ), + centers = 10, + iter_max = 100, + nstart = 1000, + algorithm = "Hartigan-Wong", + name = "kmeans", + return_gobject = TRUE, + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + + dim_reduction_to_use <- match.arg( + dim_reduction_to_use, choices = c("cells", "pca", "umap", "tsne")) + distance_method <- match.arg(distance_method, choices = c( + "original", "pearson", "spearman", + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + )) + + + ## using dimension reduction ## + if (dim_reduction_to_use != "cells" & !is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + + # use only available dimensions if dimensions < dimensions_to_use + dim_coord <- get_dimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = "cells", + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "dimObj" + ) + + dimensions_to_use <- dimensions_to_use[ + dimensions_to_use %in% 1:ncol(dim_coord[])] + matrix_to_use <- dim_coord[][, dimensions_to_use] + } else { + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + + ## using original matrix ## + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + # subset expression matrix + if (!is.null(feats_to_use)) { + expr_values[] <- expr_values[][ + rownames(expr_values[]) %in% feats_to_use, ] + } + + # features as columns + # cells as rows + matrix_to_use <- t_flex(expr_values[]) } - # features as columns - # cells as rows - matrix_to_use = t_flex(expr_values[]) - - } - - - ## distance - if(distance_method == 'original') { - celldist = matrix_to_use - } else if(distance_method %in% c('spearman', 'pearson')) { - celldist = stats::as.dist(1-cor_flex(x = t_flex(matrix_to_use), method = distance_method)) - } else if(distance_method %in% c("euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski")) { - celldist = stats::dist(x = matrix_to_use, method = distance_method) - } - - ## kmeans clustering - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = as.integer(seed_number)) - } - - # start clustering - kclusters = stats::kmeans(x = celldist, - centers = centers, - iter.max = iter_max, - nstart = nstart, - algorithm = algorithm) - - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } - - ident_clusters_DT = data.table::data.table(cell_ID = names(kclusters[['cluster']]), - 'name' = kclusters[['cluster']]) - data.table::setnames(ident_clusters_DT, 'name', name) - - - ## add clusters to metadata ## - if(isTRUE(return_gobject)) { - - #cluster_names = names(gobject@cell_metadata[[spat_unit]][[feat_type]]) - - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata <- getCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_metadata[][, eval(name) := NULL] - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = setCellMetadata(gobject = gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + ## distance + if (distance_method == "original") { + celldist <- matrix_to_use + } else if (distance_method %in% c("spearman", "pearson")) { + celldist <- stats::as.dist(1 - cor_flex( + x = t_flex(matrix_to_use), method = distance_method)) + } else if (distance_method %in% c( + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + )) { + celldist <- stats::dist(x = matrix_to_use, method = distance_method) + } + + ## kmeans clustering + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = as.integer(seed_number)) } - gobject = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID') + # start clustering + kclusters <- stats::kmeans( + x = celldist, + centers = centers, + iter.max = iter_max, + nstart = nstart, + algorithm = algorithm + ) - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_kmeans_cluster') - return(gobject) + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - } else { + ident_clusters_DT <- data.table::data.table( + cell_ID = names(kclusters[["cluster"]]), + "name" = kclusters[["cluster"]] + ) + data.table::setnames(ident_clusters_DT, "name", name) - return(ident_clusters_DT) - } + ## add clusters to metadata ## + if (isTRUE(return_gobject)) { + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata( + gobject = gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_kmeans_cluster") + return(gobject) + } else { + return(ident_clusters_DT) + } } @@ -1250,173 +1363,184 @@ doKmeans <- function(gobject, #' @seealso \code{\link[stats]{hclust}} #' @export doHclust <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - feats_to_use = NULL, - dim_reduction_to_use = c('cells', 'pca', 'umap', 'tsne'), - dim_reduction_name = 'pca', - dimensions_to_use = 1:10, - distance_method = c("pearson", "spearman", "original", - "euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski"), - agglomeration_method = c("ward.D2","ward.D", "single", - "complete", "average", "mcquitty", - "median", "centroid" ), - k = 10, - h = NULL, - name = 'hclust', - return_gobject = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - dim_reduction_to_use = match.arg( - dim_reduction_to_use, - choices = c('cells', 'pca', 'umap', 'tsne') - ) - distance_method = match.arg( - distance_method, - choices = c("pearson", "spearman", "original", - "euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski") - ) - agglomeration_method = match.arg( - agglomeration_method, - choices = c("ward.D2","ward.D", "single", - "complete", "average", "mcquitty", - "median", "centroid" ) - ) - values = match.arg(expression_values, c('normalized', 'scaled', 'custom')) - - - ## using dimension reduction ## - if(dim_reduction_to_use != 'cells' && !is.null(dim_reduction_to_use)) { - - ## TODO: check if reduction exists - - # use only available dimensions if dimensions < dimensions_to_use - dim_coord = get_dimReduction( - gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - reduction = 'cells', - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'data.table' + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + feats_to_use = NULL, + dim_reduction_to_use = c("cells", "pca", "umap", "tsne"), + dim_reduction_name = "pca", + dimensions_to_use = 1:10, + distance_method = c( + "pearson", "spearman", "original", + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + ), + agglomeration_method = c( + "ward.D2", "ward.D", "single", + "complete", "average", "mcquitty", + "median", "centroid" + ), + k = 10, + h = NULL, + name = "hclust", + return_gobject = TRUE, + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - - dimensions_to_use = dimensions_to_use[dimensions_to_use %in% 1:ncol(dim_coord)] - matrix_to_use = dim_coord[, dimensions_to_use] - } else { - ## using original matrix ## - expr_values = get_expression_values( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = "matrix" + dim_reduction_to_use <- match.arg( + dim_reduction_to_use, + choices = c("cells", "pca", "umap", "tsne") + ) + distance_method <- match.arg( + distance_method, + choices = c( + "pearson", "spearman", "original", + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + ) ) + agglomeration_method <- match.arg( + agglomeration_method, + choices = c( + "ward.D2", "ward.D", "single", + "complete", "average", "mcquitty", + "median", "centroid" + ) + ) + values <- match.arg(expression_values, c("normalized", "scaled", "custom")) + + + ## using dimension reduction ## + if (dim_reduction_to_use != "cells" && !is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + + # use only available dimensions if dimensions < dimensions_to_use + dim_coord <- getDimReduction( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + reduction = "cells", + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "data.table" + ) - # subset expression matrix - if(!is.null(feats_to_use)) { - expr_values = expr_values[rownames(expr_values) %in% feats_to_use, ] + dimensions_to_use <- dimensions_to_use[ + dimensions_to_use %in% 1:ncol(dim_coord)] + matrix_to_use <- dim_coord[, dimensions_to_use] + } else { + ## using original matrix ## + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + + # subset expression matrix + if (!is.null(feats_to_use)) { + expr_values <- expr_values[ + rownames(expr_values) %in% feats_to_use, ] + } + + # features as columns + # cells as rows + matrix_to_use <- t_flex(expr_values) } - # features as columns - # cells as rows - matrix_to_use = t_flex(expr_values) - - } - - ## distance - if(distance_method == 'original') { - celldist = matrix_to_use - } else if(distance_method %in% c('spearman', 'pearson')) { - celldist = stats::as.dist(1-cor_flex(x = t_flex(matrix_to_use), method = distance_method)) - } else if(distance_method %in% c("euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski")) { - celldist = stats::dist(x = matrix_to_use, method = distance_method) - } - - ## hierarchical clustering - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = as.integer(seed_number)) - } - - # start clustering - hclusters = stats::hclust(d = celldist, method = agglomeration_method) - hclusters_cut = stats::cutree(tree = hclusters, k = k, h = h) - - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } - - ident_clusters_DT = data.table::data.table(cell_ID = names(hclusters_cut), - 'name' = hclusters_cut) - data.table::setnames(ident_clusters_DT, 'name', name) - - - ## add clusters to metadata ## - if(return_gobject == TRUE) { - - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) - #cluster_names = names(gobject@cell_metadata[[spat_unit]][[feat_type]]) - - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_metadata[][, eval(name) := NULL] - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject <- setCellMetadata(gobject, - x = cell_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + ## distance + if (distance_method == "original") { + celldist <- matrix_to_use + } else if (distance_method %in% c("spearman", "pearson")) { + celldist <- stats::as.dist(1 - cor_flex(x = t_flex( + matrix_to_use), method = distance_method)) + } else if (distance_method %in% c( + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + )) { + celldist <- stats::dist(x = matrix_to_use, method = distance_method) } - gobject <- addCellMetadata( - gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - new_metadata = ident_clusters_DT[, c('cell_ID', name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' - ) + ## hierarchical clustering + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = as.integer(seed_number)) + } + # start clustering + hclusters <- stats::hclust(d = celldist, method = agglomeration_method) + hclusters_cut <- stats::cutree(tree = hclusters, k = k, h = h) + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_hierarchical_cluster') - return(gobject) + ident_clusters_DT <- data.table::data.table( + cell_ID = names(hclusters_cut), + "name" = hclusters_cut + ) + data.table::setnames(ident_clusters_DT, "name", name) - } else { - return(list('hclust' = hclusters, 'DT' = ident_clusters_DT)) + ## add clusters to metadata ## + if (return_gobject == TRUE) { + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + + cell_metadata[][, eval(name) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setCellMetadata(gobject, + x = cell_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + + gobject <- addCellMetadata( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + new_metadata = ident_clusters_DT[, c("cell_ID", name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) - } + + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_hierarchical_cluster") + return(gobject) + } else { + return(list("hclust" = hclusters, "DT" = ident_clusters_DT)) + } } @@ -1467,189 +1591,189 @@ doHclust <- function(gobject, #' @param seed_number number for seed #' @return giotto object with new clusters appended to cell metadata #' @details Wrapper for the different clustering methods. -#' @seealso \code{\link{doLeidenCluster}}, \code{\link{.doLouvainCluster_community}}, \code{\link{.doLouvainCluster_multinet}}, -#' \code{\link{doLouvainCluster}}, \code{\link{doRandomWalkCluster}}, \code{\link{doSNNCluster}}, +#' @seealso \code{\link{doLeidenCluster}}, +#' \code{\link{.doLouvainCluster_community}}, +#' \code{\link{.doLouvainCluster_multinet}}, +#' \code{\link{doLouvainCluster}}, \code{\link{doRandomWalkCluster}}, +#' \code{\link{doSNNCluster}}, #' \code{\link{doKmeans}}, \code{\link{doHclust}} #' @export clusterCells <- function(gobject, - cluster_method = c('leiden', - 'louvain_community', 'louvain_multinet', - 'randomwalk', 'sNNclust', - 'kmeans', 'hierarchical'), - name = 'cluster_name', - - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - - pyth_leid_resolution = 1, - pyth_leid_weight_col = 'weight', - pyth_leid_part_type = c('RBConfigurationVertexPartition', 'ModularityVertexPartition'), - pyth_leid_init_memb = NULL, - pyth_leid_iterations = 1000, - - pyth_louv_resolution = 1, - pyth_louv_weight_col = NULL, - python_louv_random = F, - - python_path = NULL, - - louvain_gamma = 1, - louvain_omega = 1, - - walk_steps = 4, - walk_clusters = 10, - walk_weights = NA, - - sNNclust_k = 20, - sNNclust_eps = 4, - sNNclust_minPts = 16, - borderPoints = TRUE, - - expression_values = c('normalized', 'scaled', 'custom'), - feats_to_use = NULL, - dim_reduction_to_use = c('cells', 'pca', 'umap', 'tsne'), - dim_reduction_name = 'pca', - dimensions_to_use = 1:10, - distance_method = c("original", "pearson", "spearman", - "euclidean", "maximum", "manhattan", - "canberra", "binary", "minkowski"), - km_centers = 10, - km_iter_max = 100, - km_nstart = 1000, - km_algorithm = "Hartigan-Wong", - - hc_agglomeration_method = c("ward.D2","ward.D", "single", - "complete", "average", "mcquitty", - "median", "centroid" ), - hc_k = 10, - hc_h = NULL, - - - return_gobject = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - ## select cluster method - cluster_method = match.arg( - arg = cluster_method, - choices = c('leiden', - 'louvain_community', - 'louvain_multinet', - 'randomwalk', - 'sNNclust', - 'kmeans', - 'hierarchical')) - - - if(cluster_method == 'leiden') { - - result = doLeidenCluster(gobject = gobject, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - python_path = python_path, - resolution = pyth_leid_resolution, - weight_col = pyth_leid_weight_col, - partition_type = pyth_leid_part_type, - init_membership = pyth_leid_init_memb, - n_iterations = pyth_leid_iterations, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'louvain_community') { - - result = .doLouvainCluster_community(gobject = gobject, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - python_path = python_path, - resolution = pyth_louv_resolution, - weight_col = pyth_louv_weight_col, - louv_random = python_louv_random, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'louvain_multinet') { - - result = .doLouvainCluster_multinet(gobject = gobject, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - gamma = louvain_gamma, - omega = louvain_omega, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'randomwalk') { - - result = doRandomWalkCluster(gobject = gobject, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - walk_steps = walk_steps, - walk_clusters = walk_clusters, - walk_weights = walk_weights, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'sNNclust') { - - result = doSNNCluster(gobject = gobject, - name = name, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - k = sNNclust_k, - eps = sNNclust_eps, - minPts = sNNclust_minPts, - borderPoints = borderPoints, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'kmeans') { - - result = doKmeans(gobject = gobject, - name = name, - expression_values = expression_values, - feats_to_use = feats_to_use, - dim_reduction_to_use = dim_reduction_to_use, - dim_reduction_name = dim_reduction_name, - dimensions_to_use = dimensions_to_use, - distance_method = distance_method, - centers = km_centers, - iter_max = km_iter_max, - nstart = km_nstart, - algorithm = km_algorithm, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } else if(cluster_method == 'hierarchical') { - - result = doHclust(gobject = gobject, - name = name, - expression_values = expression_values, - feats_to_use = feats_to_use, - dim_reduction_to_use = dim_reduction_to_use, - dim_reduction_name = dim_reduction_name, - dimensions_to_use = dimensions_to_use, - distance_method = distance_method, - agglomeration_method = hc_agglomeration_method, - k = hc_k, - h = hc_h, - return_gobject = return_gobject, - set_seed = set_seed, - seed_number = seed_number) - - } - - - return(result) + cluster_method = c( + "leiden", + "louvain_community", "louvain_multinet", + "randomwalk", "sNNclust", + "kmeans", "hierarchical" + ), + name = "cluster_name", + nn_network_to_use = "sNN", + network_name = "sNN.pca", + pyth_leid_resolution = 1, + pyth_leid_weight_col = "weight", + pyth_leid_part_type = c("RBConfigurationVertexPartition", + "ModularityVertexPartition"), + pyth_leid_init_memb = NULL, + pyth_leid_iterations = 1000, + pyth_louv_resolution = 1, + pyth_louv_weight_col = NULL, + python_louv_random = FALSE, + python_path = NULL, + louvain_gamma = 1, + louvain_omega = 1, + walk_steps = 4, + walk_clusters = 10, + walk_weights = NA, + sNNclust_k = 20, + sNNclust_eps = 4, + sNNclust_minPts = 16, + borderPoints = TRUE, + expression_values = c("normalized", "scaled", "custom"), + feats_to_use = NULL, + dim_reduction_to_use = c("cells", "pca", "umap", "tsne"), + dim_reduction_name = "pca", + dimensions_to_use = 1:10, + distance_method = c( + "original", "pearson", "spearman", + "euclidean", "maximum", "manhattan", + "canberra", "binary", "minkowski" + ), + km_centers = 10, + km_iter_max = 100, + km_nstart = 1000, + km_algorithm = "Hartigan-Wong", + hc_agglomeration_method = c( + "ward.D2", "ward.D", "single", + "complete", "average", "mcquitty", + "median", "centroid" + ), + hc_k = 10, + hc_h = NULL, + return_gobject = TRUE, + set_seed = TRUE, + seed_number = 1234) { + ## select cluster method + cluster_method <- match.arg( + arg = cluster_method, + choices = c( + "leiden", + "louvain_community", + "louvain_multinet", + "randomwalk", + "sNNclust", + "kmeans", + "hierarchical" + ) + ) + + + if (cluster_method == "leiden") { + result <- doLeidenCluster( + gobject = gobject, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + python_path = python_path, + resolution = pyth_leid_resolution, + weight_col = pyth_leid_weight_col, + partition_type = pyth_leid_part_type, + init_membership = pyth_leid_init_memb, + n_iterations = pyth_leid_iterations, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "louvain_community") { + result <- .doLouvainCluster_community( + gobject = gobject, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + python_path = python_path, + resolution = pyth_louv_resolution, + weight_col = pyth_louv_weight_col, + louv_random = python_louv_random, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "louvain_multinet") { + result <- .doLouvainCluster_multinet( + gobject = gobject, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + gamma = louvain_gamma, + omega = louvain_omega, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "randomwalk") { + result <- doRandomWalkCluster( + gobject = gobject, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + walk_steps = walk_steps, + walk_clusters = walk_clusters, + walk_weights = walk_weights, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "sNNclust") { + result <- doSNNCluster( + gobject = gobject, + name = name, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + k = sNNclust_k, + eps = sNNclust_eps, + minPts = sNNclust_minPts, + borderPoints = borderPoints, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "kmeans") { + result <- doKmeans( + gobject = gobject, + name = name, + expression_values = expression_values, + feats_to_use = feats_to_use, + dim_reduction_to_use = dim_reduction_to_use, + dim_reduction_name = dim_reduction_name, + dimensions_to_use = dimensions_to_use, + distance_method = distance_method, + centers = km_centers, + iter_max = km_iter_max, + nstart = km_nstart, + algorithm = km_algorithm, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (cluster_method == "hierarchical") { + result <- doHclust( + gobject = gobject, + name = name, + expression_values = expression_values, + feats_to_use = feats_to_use, + dim_reduction_to_use = dim_reduction_to_use, + dim_reduction_name = dim_reduction_name, + dimensions_to_use = dimensions_to_use, + distance_method = distance_method, + agglomeration_method = hc_agglomeration_method, + k = hc_k, + h = hc_h, + return_gobject = return_gobject, + set_seed = set_seed, + seed_number = seed_number + ) + } + + return(result) } @@ -1659,7 +1783,8 @@ clusterCells <- function(gobject, #' @title doLeidenSubCluster #' @name doLeidenSubCluster -#' @description Further subcluster cells using a NN-network and the Leiden algorithm +#' @description Further subcluster cells using a NN-network and the Leiden +#' algorithm #' @param gobject giotto object #' @param feat_type feature type #' @param name name for new clustering result @@ -1669,11 +1794,14 @@ clusterCells <- function(gobject, #' @param hvg_param deprecatd, use hvf_param #' @param hvf_min_perc_cells threshold for detection in min percentage of cells #' @param hvg_min_perc_cells deprecated, use hvf_min_perc_cells -#' @param hvf_mean_expr_det threshold for mean expression level in cells with detection +#' @param hvf_mean_expr_det threshold for mean expression level in cells with +#' detection #' @param hvg_mean_expr_det deprecated, use hvf_mean_expr_det -#' @param use_all_feats_as_hvf forces all features to be HVF and to be used as input for PCA +#' @param use_all_feats_as_hvf forces all features to be HVF and to be used as +#' input for PCA #' @param use_all_genes_as_hvg deprecated, use use_all_feats_as_hvf -#' @param min_nr_of_hvf minimum number of HVF, or all features will be used as input for PCA +#' @param min_nr_of_hvf minimum number of HVF, or all features will be used as +#' input for PCA #' @param min_nr_of_hvg deprecated, use min_nr_of_hvf #' @param pca_param parameters for runPCA #' @param nn_param parameters for parameters for createNearestNetwork @@ -1686,7 +1814,8 @@ clusterCells <- function(gobject, #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param verbose verbose #' @return giotto object with new subclusters appended to cell metadata -#' @details This function performs subclustering using the Leiden algorithm on selected clusters. +#' @details This function performs subclustering using the Leiden algorithm on +#' selected clusters. #' The systematic steps are: #' \itemize{ #' \item{1. subset Giotto object} @@ -1697,191 +1826,212 @@ clusterCells <- function(gobject, #' } #' @seealso \code{\link{doLeidenCluster}} #' @export -doLeidenSubCluster = function(gobject, - feat_type = NULL, - name = 'sub_pleiden_clus', - cluster_column = NULL, - selected_clusters = NULL, - hvf_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, expression_values = 'normalized'), - hvg_param = NULL, - hvf_min_perc_cells = 5, - hvg_min_perc_cells = NULL, - hvf_mean_expr_det = 1, - hvg_mean_expr_det = NULL, - use_all_feats_as_hvf = FALSE, - use_all_genes_as_hvg = NULL, - min_nr_of_hvf = 5, - min_nr_of_hvg = NULL, - pca_param = list(expression_values = 'normalized', scale_unit = TRUE), - nn_param = list(dimensions_to_use = 1:20), - k_neighbors = 10, - resolution = 0.5, - n_iterations = 500, - python_path = NULL, - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - return_gobject = TRUE, - verbose = TRUE) { - - - # specify feat_type - if(is.null(feat_type)) { - feat_type = gobject@expression_feat[[1]] - } - - ## deprecated arguments - if(!is.null(hvg_param)) { - hvf_param = hvg_param - warning('hvg_param is deprecated, use hvf_param in the future \n') - } - if(!is.null(hvg_min_perc_cells)) { - hvf_min_perc_cells = hvg_min_perc_cells - warning('hvg_min_perc_cells is deprecated, use hvf_min_perc_cells in the future \n') - } - if(!is.null(hvg_mean_expr_det)) { - hvf_mean_expr_det = hvg_mean_expr_det - warning('hvg_mean_expr_det is deprecated, use hvf_mean_expr_det in the future \n') - } - if(!is.null(use_all_genes_as_hvg)) { - use_all_feats_as_hvf = use_all_genes_as_hvg - warning('use_all_genes_as_hvg is deprecated, use use_all_feats_as_hvf in the future \n') - } - if(!is.null(min_nr_of_hvg)) { - min_nr_of_hvf = min_nr_of_hvg - warning('min_nr_of_hvg is deprecated, use min_nr_of_hvf in the future \n') - } - - - - iter_list = list() - - cell_metadata = pDataDT(gobject, - feat_type = feat_type) - - if(is.null(cluster_column)) { - stop('\n You need to provide a cluster column to subcluster on \n') - } - unique_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) - - - # data.table variables - hvf = perc_cells = mean_expr_det = parent_cluster = comb = tempclus = NULL - - - for (cluster in unique_clusters) { - - if(verbose == TRUE) cat('\n start with cluster: ', cluster, '\n') - - ## get subset - subset_cell_IDs = cell_metadata[get(cluster_column) == cluster][['cell_ID']] - temp_giotto = subsetGiotto(gobject = gobject, - feat_type = feat_type, - cell_ids = subset_cell_IDs) - - ## if cluster is not selected - if(!is.null(selected_clusters) & !cluster %in% selected_clusters) { - - temp_cluster = data.table('cell_ID' = subset_cell_IDs, 'tempclus' = 1, 'parent_cluster' = cluster) - iter_list[[cluster]] = temp_cluster - - } else { - # continue for selected clusters or all clusters if there is no selection - - ## calculate stats - temp_giotto <- addStatistics(gobject = temp_giotto, - feat_type = feat_type) - - ## calculate variable feats - temp_giotto = do.call('calculateHVF', c(gobject = temp_giotto, hvf_param)) - - ## get hvg - feat_metadata = fDataDT(temp_giotto, - feat_type = feat_type) - featfeats = feat_metadata[hvf == 'yes' & perc_cells >= hvf_min_perc_cells & mean_expr_det >= hvf_mean_expr_det]$feat_ID - - ## catch too low number of hvg - if(use_all_feats_as_hvf == TRUE) { - featfeats == feat_metadata$feat_ID - } else { - if(verbose == TRUE) cat('\n', length(featfeats), 'highly variable feats have been selected \n') - if(length(featfeats) <= min_nr_of_hvf) { - cat('\n too few feats, will continue with all feats instead \n') - featfeats = feat_metadata$feat_ID - } - } - - ## run PCA - temp_giotto = do.call('runPCA', c(gobject = temp_giotto, feats_to_use = list(featfeats), pca_param)) - - ## nearest neighbor and clustering - temp_giotto = do.call('createNearestNetwork', c(gobject = temp_giotto, k = k_neighbors, nn_param)) +doLeidenSubCluster <- function(gobject, + feat_type = NULL, + name = "sub_pleiden_clus", + cluster_column = NULL, + selected_clusters = NULL, + hvf_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, + expression_values = "normalized"), + hvg_param = NULL, + hvf_min_perc_cells = 5, + hvg_min_perc_cells = NULL, + hvf_mean_expr_det = 1, + hvg_mean_expr_det = NULL, + use_all_feats_as_hvf = FALSE, + use_all_genes_as_hvg = NULL, + min_nr_of_hvf = 5, + min_nr_of_hvg = NULL, + pca_param = list(expression_values = "normalized", scale_unit = TRUE), + nn_param = list(dimensions_to_use = 1:20), + k_neighbors = 10, + resolution = 0.5, + n_iterations = 500, + python_path = NULL, + nn_network_to_use = "sNN", + network_name = "sNN.pca", + return_gobject = TRUE, + verbose = TRUE) { + # specify feat_type + if (is.null(feat_type)) { + feat_type <- gobject@expression_feat[[1]] + } - ## Leiden Cluster - ## TO DO: expand to all clustering options - temp_cluster = doLeidenCluster(gobject = temp_giotto, - feat_type = feat_type, - resolution = resolution, - n_iterations = n_iterations, - python_path = python_path, - name = 'tempclus', - return_gobject = F) + ## deprecated arguments + if (!is.null(hvg_param)) { + hvf_param <- hvg_param + warning("hvg_param is deprecated, use hvf_param in the future") + } + if (!is.null(hvg_min_perc_cells)) { + hvf_min_perc_cells <- hvg_min_perc_cells + warning("hvg_min_perc_cells is deprecated, use hvf_min_perc_cells in + the future") + } + if (!is.null(hvg_mean_expr_det)) { + hvf_mean_expr_det <- hvg_mean_expr_det + warning("hvg_mean_expr_det is deprecated, use hvf_mean_expr_det in the + future") + } + if (!is.null(use_all_genes_as_hvg)) { + use_all_feats_as_hvf <- use_all_genes_as_hvg + warning("use_all_genes_as_hvg is deprecated, use use_all_feats_as_hvf + in the future") + } + if (!is.null(min_nr_of_hvg)) { + min_nr_of_hvf <- min_nr_of_hvg + warning("min_nr_of_hvg is deprecated, use min_nr_of_hvf in the future") + } - temp_cluster[, parent_cluster := cluster] - iter_list[[cluster]] = temp_cluster + iter_list <- list() + cell_metadata <- pDataDT(gobject, + feat_type = feat_type + ) + if (is.null(cluster_column)) { + stop("You need to provide a cluster column to subcluster on") } + unique_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) - } - together = do.call('rbind', iter_list) - together[, comb := paste0(parent_cluster,'.',tempclus)] + # data.table variables + hvf <- perc_cells <- mean_expr_det <- parent_cluster <- comb <- + tempclus <- NULL - # rename with subcluster of original name - #new_cluster_column = paste0(cluster_column,'_sub') - setnames(together, 'comb', name) + for (cluster in unique_clusters) { + if (verbose == TRUE) cat("start with cluster: ", cluster) - if(return_gobject == TRUE) { + ## get subset + subset_cell_IDs <- cell_metadata[ + get(cluster_column) == cluster][["cell_ID"]] + temp_giotto <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + cell_ids = subset_cell_IDs + ) - cluster_names = names(gobject@cell_metadata[[feat_type]]) - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata[[feat_type]] - cell_metadata[, eval(name) := NULL] - gobject@cell_metadata[[feat_type]] = cell_metadata + ## if cluster is not selected + if (!is.null(selected_clusters) & !cluster %in% selected_clusters) { + temp_cluster <- data.table( + "cell_ID" = subset_cell_IDs, + "tempclus" = 1, + "parent_cluster" = cluster) + iter_list[[cluster]] <- temp_cluster + } else { + # continue for selected clusters or all clusters if there is no + # selection + + ## calculate stats + temp_giotto <- addStatistics( + gobject = temp_giotto, + feat_type = feat_type + ) + + ## calculate variable feats + temp_giotto <- do.call( + "calculateHVF", c(gobject = temp_giotto, hvf_param)) + + ## get hvg + feat_metadata <- fDataDT(temp_giotto, + feat_type = feat_type + ) + featfeats <- feat_metadata[ + hvf == "yes" & perc_cells >= hvf_min_perc_cells & + mean_expr_det >= hvf_mean_expr_det]$feat_ID + + ## catch too low number of hvg + if (use_all_feats_as_hvf == TRUE) { + featfeats == feat_metadata$feat_ID + } else { + if (verbose == TRUE) + cat(length(featfeats), + "highly variable feats have been selected") + if (length(featfeats) <= min_nr_of_hvf) { + message("too few feats, will continue with all feats + instead") + featfeats <- feat_metadata$feat_ID + } + } + + ## run PCA + temp_giotto <- do.call( + "runPCA", + c(gobject = temp_giotto, feats_to_use = list(featfeats), + pca_param)) + + ## nearest neighbor and clustering + temp_giotto <- do.call( + "createNearestNetwork", + c(gobject = temp_giotto, k = k_neighbors, nn_param)) + + ## Leiden Cluster + ## TO DO: expand to all clustering options + temp_cluster <- doLeidenCluster( + gobject = temp_giotto, + feat_type = feat_type, + resolution = resolution, + n_iterations = n_iterations, + python_path = python_path, + name = "tempclus", + return_gobject = FALSE + ) + + temp_cluster[, parent_cluster := cluster] + + iter_list[[cluster]] <- temp_cluster + } } - gobject <- addCellMetadata(gobject, - feat_type = feat_type, - new_metadata = together[, c('cell_ID', name), with = F], - by_column = TRUE, column_cell_ID = 'cell_ID') + together <- do.call("rbind", iter_list) + together[, comb := paste0(parent_cluster, ".", tempclus)] + + # rename with subcluster of original name + setnames(together, "comb", name) - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_sub_cluster') - return(gobject) - } else { - return(together) - } + if (return_gobject == TRUE) { + cluster_names <- names(gobject@cell_metadata[[feat_type]]) + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata[[feat_type]] + cell_metadata[, eval(name) := NULL] + gobject@cell_metadata[[feat_type]] <- cell_metadata + } + gobject <- addCellMetadata(gobject, + feat_type = feat_type, + new_metadata = together[, c("cell_ID", name), with = FALSE], + by_column = TRUE, column_cell_ID = "cell_ID" + ) + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_sub_cluster") + return(gobject) + } else { + return(together) + } } #' @title doLouvainSubCluster community #' @name .doLouvainSubCluster_community -#' @description subcluster cells using a NN-network and the Louvain community detection algorithm +#' @description subcluster cells using a NN-network and the Louvain community +#' detection algorithm #' @param gobject giotto object #' @param name name for new clustering result #' @param cluster_column cluster column to subcluster #' @param selected_clusters only do subclustering on these clusters #' @param hvg_param parameters for calculateHVG #' @param hvg_min_perc_cells threshold for detection in min percentage of cells -#' @param hvg_mean_expr_det threshold for mean expression level in cells with detection -#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as input for PCA -#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as input for PCA +#' @param hvg_mean_expr_det threshold for mean expression level in cells with +#' detection +#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as +#' input for PCA +#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as +#' input for PCA #' @param pca_param parameters for runPCA #' @param nn_param parameters for parameters for createNearestNetwork #' @param k_neighbors number of k for createNearestNetwork @@ -1889,10 +2039,11 @@ doLeidenSubCluster = function(gobject, #' @param python_path specify specific path to python if required #' @param nn_network_to_use type of NN network to use (kNN vs sNN) #' @param network_name name of NN network to use -#' @param return_gobject boolean: return giotto object (default = TRUE) +#' @param return_gobject Boolean: return giotto object (default = TRUE) #' @param verbose verbose #' @return giotto object with new subclusters appended to cell metadata -#' @details This function performs subclustering using the Louvain community algorithm on selected clusters. +#' @details This function performs subclustering using the Louvain community +#' algorithm on selected clusters. #' The systematic steps are: #' \itemize{ #' \item{1. subset Giotto object} @@ -1903,150 +2054,162 @@ doLeidenSubCluster = function(gobject, #' } #' @seealso \code{\link{.doLouvainCluster_community}} #' @keywords internal -.doLouvainSubCluster_community = function(gobject, - name = 'sub_louvain_comm_clus', - cluster_column = NULL, - selected_clusters = NULL, - hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, expression_values = 'normalized'), - hvg_min_perc_cells = 5, - hvg_mean_expr_det = 1, - use_all_genes_as_hvg = FALSE, - min_nr_of_hvg = 5, - pca_param = list(expression_values = 'normalized', scale_unit = T), - nn_param = list(dimensions_to_use = 1:20), - k_neighbors = 10, - resolution = 0.5, - python_path = NULL, - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - return_gobject = TRUE, - verbose = T) { - - - iter_list = list() - - cell_metadata = pDataDT(gobject) - - if(is.null(cluster_column)) { - stop('\n You need to provide a cluster column to subcluster on \n') - } - unique_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) - - ## if clusters start at 0, then add +1 for the index ## - index_offset = ifelse(0 %in% unique_clusters, 1, 0) - - for(cluster in unique_clusters) { - - if(verbose == TRUE) cat('\n start with cluster: ', cluster, '\n') - - ## get subset - subset_cell_IDs = cell_metadata[get(cluster_column) == cluster][['cell_ID']] - temp_giotto = subsetGiotto(gobject = gobject, cell_ids = subset_cell_IDs) - - ## if cluster is not selected - if(!is.null(selected_clusters) & !cluster %in% selected_clusters) { - - temp_cluster = data.table('cell_ID' = subset_cell_IDs, 'tempclus' = 1, 'parent_cluster' = cluster) - iter_list[[cluster+index_offset]] = temp_cluster - - } else { - # continue for selected clusters or all clusters if there is no selection - - ## calculate stats - temp_giotto <- addStatistics(gobject = temp_giotto) - - ## calculate variable genes - temp_giotto = do.call('calculateHVG', c(gobject = temp_giotto, hvg_param)) - - ## get hvg - gene_metadata = fDataDT(temp_giotto) - - # data.table variables - hvg = perc_cells = mean_expr_det = NULL - - featgenes = gene_metadata[hvg == 'yes' & perc_cells >= hvg_min_perc_cells & mean_expr_det >= hvg_mean_expr_det]$gene_ID - - ## catch too low number of hvg - if(use_all_genes_as_hvg == TRUE) { - featgenes == gene_metadata$gene_ID - } else { - if(verbose == TRUE) cat('\n', length(featgenes), 'highly variable genes have been selected \n') - if(length(featgenes) <= min_nr_of_hvg) { - cat('\n too few genes, will continue with all genes instead \n') - featgenes = gene_metadata$gene_ID +.doLouvainSubCluster_community <- function(gobject, + name = "sub_louvain_comm_clus", + cluster_column = NULL, + selected_clusters = NULL, + hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, + expression_values = "normalized"), + hvg_min_perc_cells = 5, + hvg_mean_expr_det = 1, + use_all_genes_as_hvg = FALSE, + min_nr_of_hvg = 5, + pca_param = list(expression_values = "normalized", scale_unit = TRUE), + nn_param = list(dimensions_to_use = 1:20), + k_neighbors = 10, + resolution = 0.5, + python_path = NULL, + nn_network_to_use = "sNN", + network_name = "sNN.pca", + return_gobject = TRUE, + verbose = TRUE) { + iter_list <- list() + + cell_metadata <- pDataDT(gobject) + + if (is.null(cluster_column)) { + stop("You need to provide a cluster column to subcluster on") + } + unique_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) + + ## if clusters start at 0, then add +1 for the index ## + index_offset <- ifelse(0 %in% unique_clusters, 1, 0) + + for (cluster in unique_clusters) { + if (verbose == TRUE) cat("start with cluster: ", cluster) + + ## get subset + subset_cell_IDs <- cell_metadata[ + get(cluster_column) == cluster][["cell_ID"]] + temp_giotto <- subsetGiotto(gobject = gobject, + cell_ids = subset_cell_IDs) + + ## if cluster is not selected + if (!is.null(selected_clusters) & !cluster %in% selected_clusters) { + temp_cluster <- data.table( + "cell_ID" = subset_cell_IDs, + "tempclus" = 1, + "parent_cluster" = cluster) + iter_list[[cluster + index_offset]] <- temp_cluster + } else { + # continue for selected clusters or all clusters if there is no + # selection + + ## calculate stats + temp_giotto <- addStatistics(gobject = temp_giotto) + + ## calculate variable genes + temp_giotto <- do.call( + "calculateHVG", c(gobject = temp_giotto, hvg_param)) + + ## get hvg + gene_metadata <- fDataDT(temp_giotto) + + # data.table variables + hvg <- perc_cells <- mean_expr_det <- NULL + + featgenes <- gene_metadata[ + hvg == "yes" & perc_cells >= hvg_min_perc_cells & + mean_expr_det >= hvg_mean_expr_det]$gene_ID + + ## catch too low number of hvg + if (use_all_genes_as_hvg == TRUE) { + featgenes == gene_metadata$gene_ID + } else { + if (verbose == TRUE) + cat(length(featgenes), + "highly variable genes have been selected") + if (length(featgenes) <= min_nr_of_hvg) { + message("too few genes, will continue with all genes + instead") + featgenes <- gene_metadata$gene_ID + } + } + + ## run PCA + temp_giotto <- do.call( + "runPCA", + c(gobject = temp_giotto, genes_to_use = list(featgenes), + pca_param)) + + ## nearest neighbor and clustering + temp_giotto <- do.call( + "createNearestNetwork", + c(gobject = temp_giotto, k = k_neighbors, nn_param)) + + ## Leiden Cluster + ## TO DO: expand to all clustering options + temp_cluster <- .doLouvainCluster_community( + gobject = temp_giotto, + resolution = resolution, + python_path = python_path, + name = "tempclus", + return_gobject = FALSE + ) + + # data.table variables + parent_cluster <- NULL + + temp_cluster[, parent_cluster := cluster] + + iter_list[[cluster + index_offset]] <- temp_cluster } - } - - ## run PCA - temp_giotto = do.call('runPCA', c(gobject = temp_giotto, genes_to_use = list(featgenes), pca_param)) - - ## nearest neighbor and clustering - temp_giotto = do.call('createNearestNetwork', c(gobject = temp_giotto, k = k_neighbors, nn_param)) - - ## Leiden Cluster - ## TO DO: expand to all clustering options - temp_cluster = .doLouvainCluster_community(gobject = temp_giotto, - resolution = resolution, - python_path = python_path, - name = 'tempclus', - return_gobject = F) - - # data.table variables - parent_cluster = NULL - - temp_cluster[, parent_cluster := cluster] - - iter_list[[cluster+index_offset]] = temp_cluster - - - } - } - - together = do.call('rbind', iter_list) + together <- do.call("rbind", iter_list) - # data.table variables - comb = tempclus = NULL + # data.table variables + comb <- tempclus <- NULL - together[, comb := paste0(parent_cluster,'.',tempclus)] + together[, comb := paste0(parent_cluster, ".", tempclus)] - # rename with subcluster of original name - #new_cluster_column = paste0(cluster_column,'_sub') - setnames(together, 'comb', name) + # rename with subcluster of original name + # new_cluster_column = paste0(cluster_column,'_sub') + setnames(together, "comb", name) - if(return_gobject == TRUE) { - - cluster_names = names(gobject@cell_metadata) - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata - cell_metadata[, eval(name) := NULL] - gobject@cell_metadata = cell_metadata - } - - gobject <- addCellMetadata(gobject, new_metadata = together[, c('cell_ID', name), with = FALSE], - by_column = TRUE, column_cell_ID = 'cell_ID') - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_sub_cluster') + if (return_gobject == TRUE) { + cluster_names <- names(gobject@cell_metadata) + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata + cell_metadata[, eval(name) := NULL] + gobject@cell_metadata <- cell_metadata + } - # parameters to include - parameters_list[[update_name]] = c('subclus name' = name, - 'resolution ' = resolution, - 'k neighbors ' = k_neighbors) + gobject <- addCellMetadata(gobject, + new_metadata = together[, c("cell_ID", name), with = FALSE], + by_column = TRUE, column_cell_ID = "cell_ID" + ) - gobject@parameters = parameters_list + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_sub_cluster") - return(gobject) + # parameters to include + parameters_list[[update_name]] <- c( + "subclus name" = name, + "resolution " = resolution, + "k neighbors " = k_neighbors + ) - } else { - return(together) - } + gobject@parameters <- parameters_list + return(gobject) + } else { + return(together) + } } @@ -2054,16 +2217,20 @@ doLeidenSubCluster = function(gobject, #' @title doLouvainSubCluster multinet #' @name .doLouvainSubCluster_multinet -#' @description subcluster cells using a NN-network and the Louvain multinet detection algorithm +#' @description subcluster cells using a NN-network and the Louvain multinet +#' detection algorithm #' @param gobject giotto object #' @param name name for new clustering result #' @param cluster_column cluster column to subcluster #' @param selected_clusters only do subclustering on these clusters #' @param hvg_param parameters for calculateHVG #' @param hvg_min_perc_cells threshold for detection in min percentage of cells -#' @param hvg_mean_expr_det threshold for mean expression level in cells with detection -#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as input for PCA -#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as input for PCA +#' @param hvg_mean_expr_det threshold for mean expression level in cells with +#' detection +#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as +#' input for PCA +#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as +#' input for PCA #' @param pca_param parameters for runPCA #' @param nn_param parameters for parameters for createNearestNetwork #' @param k_neighbors number of k for createNearestNetwork @@ -2074,7 +2241,8 @@ doLeidenSubCluster = function(gobject, #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param verbose verbose #' @return giotto object with new subclusters appended to cell metadata -#' @details This function performs subclustering using the Louvain multinet algorithm on selected clusters. +#' @details This function performs subclustering using the Louvain multinet +#' algorithm on selected clusters. #' The systematic steps are: #' \itemize{ #' \item{1. subset Giotto object} @@ -2085,157 +2253,168 @@ doLeidenSubCluster = function(gobject, #' } #' @seealso \code{\link{.doLouvainCluster_multinet}} #' @keywords internal -.doLouvainSubCluster_multinet = function(gobject, - name = 'sub_louvain_mult_clus', - cluster_column = NULL, - selected_clusters = NULL, - hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, expression_values = 'normalized'), - hvg_min_perc_cells = 5, - hvg_mean_expr_det = 1, - use_all_genes_as_hvg = FALSE, - min_nr_of_hvg = 5, - pca_param = list(expression_values = 'normalized', scale_unit = TRUE), - nn_param = list(dimensions_to_use = 1:20), - k_neighbors = 10, - gamma = 1, - omega = 1, - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - return_gobject = TRUE, - verbose = TRUE) { - - - if("multinet" %in% rownames(installed.packages()) == FALSE) { - stop("\n package 'multinet' is not yet installed \n", - "To install: \n", - "install.packages('multinet')" - ) - } - - iter_list = list() - - cell_metadata = pDataDT(gobject) - - if(is.null(cluster_column)) { - stop('\n You need to provide a cluster column to subcluster on \n') - } - unique_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) - - ## if clusters start at 0, then add +1 for the index ## - index_offset = ifelse(0 %in% unique_clusters, 1, 0) - - - # data.table variables - hvg = perc_cells = mean_expr_det = parent_cluster = cell_ID = comb = tempclus = NULL - - - for(cluster in unique_clusters) { - - if(verbose == TRUE) cat('\n start with cluster: ', cluster, '\n') - - ## get subset - subset_cell_IDs = cell_metadata[get(cluster_column) == cluster][['cell_ID']] - temp_giotto = subsetGiotto(gobject = gobject, cell_ids = subset_cell_IDs) +.doLouvainSubCluster_multinet <- function(gobject, + name = "sub_louvain_mult_clus", + cluster_column = NULL, + selected_clusters = NULL, + hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, + expression_values = "normalized"), + hvg_min_perc_cells = 5, + hvg_mean_expr_det = 1, + use_all_genes_as_hvg = FALSE, + min_nr_of_hvg = 5, + pca_param = list(expression_values = "normalized", scale_unit = TRUE), + nn_param = list(dimensions_to_use = 1:20), + k_neighbors = 10, + gamma = 1, + omega = 1, + nn_network_to_use = "sNN", + network_name = "sNN.pca", + return_gobject = TRUE, + verbose = TRUE) { + if ("multinet" %in% rownames(installed.packages()) == FALSE) { + stop( + "package 'multinet' is not yet installed \n", + "To install: \n", + "install.packages('multinet')" + ) + } - ## if cluster is not selected - if(!is.null(selected_clusters) & !cluster %in% selected_clusters) { + iter_list <- list() - temp_cluster = data.table('cell_ID' = subset_cell_IDs, 'tempclus' = 1, 'parent_cluster' = cluster) - iter_list[[cluster+index_offset]] = temp_cluster + cell_metadata <- pDataDT(gobject) - } else { - # continue for selected clusters or all clusters if there is no selection - - ## calculate stats - temp_giotto <- addStatistics(gobject = temp_giotto) - - ## calculate variable genes - temp_giotto = do.call('calculateHVG', c(gobject = temp_giotto, hvg_param)) - - ## get hvg - gene_metadata = fDataDT(temp_giotto) - featgenes = gene_metadata[hvg == 'yes' & perc_cells >= hvg_min_perc_cells & mean_expr_det >= hvg_mean_expr_det]$gene_ID - - ## catch too low number of hvg - if(use_all_genes_as_hvg == TRUE) { - featgenes == gene_metadata$gene_ID - } else { - if(verbose == TRUE) cat('\n', length(featgenes), 'highly variable genes have been selected \n') - if(length(featgenes) <= min_nr_of_hvg) { - cat('\n too few genes, will continue with all genes instead \n') - featgenes = gene_metadata$gene_ID + if (is.null(cluster_column)) { + stop("You need to provide a cluster column to subcluster on") + } + unique_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) + + ## if clusters start at 0, then add +1 for the index ## + index_offset <- ifelse(0 %in% unique_clusters, 1, 0) + + + # data.table variables + hvg <- perc_cells <- mean_expr_det <- parent_cluster <- cell_ID <- + comb <- tempclus <- NULL + + + for (cluster in unique_clusters) { + if (verbose == TRUE) cat("start with cluster: ", cluster) + + ## get subset + subset_cell_IDs <- cell_metadata[ + get(cluster_column) == cluster][["cell_ID"]] + temp_giotto <- subsetGiotto(gobject = gobject, + cell_ids = subset_cell_IDs) + + ## if cluster is not selected + if (!is.null(selected_clusters) & !cluster %in% selected_clusters) { + temp_cluster <- data.table( + "cell_ID" = subset_cell_IDs, + "tempclus" = 1, + "parent_cluster" = cluster) + iter_list[[cluster + index_offset]] <- temp_cluster + } else { + # continue for selected clusters or all clusters if there is no + # selection + + ## calculate stats + temp_giotto <- addStatistics(gobject = temp_giotto) + + ## calculate variable genes + temp_giotto <- do.call( + "calculateHVG", c(gobject = temp_giotto, hvg_param)) + + ## get hvg + gene_metadata <- fDataDT(temp_giotto) + featgenes <- gene_metadata[ + hvg == "yes" & perc_cells >= hvg_min_perc_cells & + mean_expr_det >= hvg_mean_expr_det]$gene_ID + + ## catch too low number of hvg + if (use_all_genes_as_hvg == TRUE) { + featgenes == gene_metadata$gene_ID + } else { + if (verbose == TRUE) + cat(length(featgenes), + "highly variable genes have been selecteds") + if (length(featgenes) <= min_nr_of_hvg) { + message("too few genes, will continue with all genes + instead") + featgenes <- gene_metadata$gene_ID + } + } + + ## run PCA + temp_giotto <- do.call( + "runPCA", + c(gobject = temp_giotto, genes_to_use = list(featgenes), + pca_param)) + + ## nearest neighbor and clustering + temp_giotto <- do.call( + "createNearestNetwork", + c(gobject = temp_giotto, k = k_neighbors, nn_param)) + + ## Leiden Cluster + ## TO DO: expand to all clustering options + temp_cluster <- .doLouvainCluster_multinet( + gobject = temp_giotto, + gamma = gamma, + omega = omega, + name = "tempclus", + return_gobject = FALSE + ) + + temp_cluster[, parent_cluster := cluster] + temp_cluster <- temp_cluster[, .(cell_ID, tempclus, parent_cluster)] + + iter_list[[cluster + index_offset]] <- temp_cluster } - } - - ## run PCA - temp_giotto = do.call('runPCA', c(gobject = temp_giotto, genes_to_use = list(featgenes), pca_param)) - - ## nearest neighbor and clustering - temp_giotto = do.call('createNearestNetwork', c(gobject = temp_giotto, k = k_neighbors, nn_param)) - - ## Leiden Cluster - ## TO DO: expand to all clustering options - temp_cluster = .doLouvainCluster_multinet(gobject = temp_giotto, - gamma = gamma, - omega = omega, - name = 'tempclus', - return_gobject = F) - - temp_cluster[, parent_cluster := cluster] - temp_cluster = temp_cluster[,.(cell_ID, tempclus, parent_cluster)] - - iter_list[[cluster+index_offset]] = temp_cluster - - - } - } - - together = do.call('rbind', iter_list) - together[, comb := paste0(parent_cluster,'.',tempclus)] + together <- do.call("rbind", iter_list) + together[, comb := paste0(parent_cluster, ".", tempclus)] - # rename with subcluster of original name - #new_cluster_column = paste0(cluster_column,'_sub') - setnames(together, 'comb', name) + # rename with subcluster of original name + setnames(together, "comb", name) - if(return_gobject == TRUE) { - - cluster_names = names(gobject@cell_metadata) - if(name %in% cluster_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata - cell_metadata[, eval(name) := NULL] - gobject@cell_metadata = cell_metadata - } - - gobject <- addCellMetadata( - gobject, - new_metadata = together[, c('cell_ID', name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' - ) - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_sub_cluster') - - # parameters to include - parameters_list[[update_name]] = c('subclus name' = name, - 'k neighbors ' = k_neighbors, - 'gamma' = gamma, - 'omega' = omega) + if (return_gobject == TRUE) { + cluster_names <- names(gobject@cell_metadata) + if (name %in% cluster_names) { + cat(name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata + cell_metadata[, eval(name) := NULL] + gobject@cell_metadata <- cell_metadata + } - gobject@parameters = parameters_list + gobject <- addCellMetadata( + gobject, + new_metadata = together[, c("cell_ID", name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) - return(gobject) + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_sub_cluster") + + # parameters to include + parameters_list[[update_name]] <- c( + "subclus name" = name, + "k neighbors " = k_neighbors, + "gamma" = gamma, + "omega" = omega + ) - } else { - return(together) - } + gobject@parameters <- parameters_list + return(gobject) + } else { + return(together) + } } @@ -2250,9 +2429,12 @@ doLeidenSubCluster = function(gobject, #' @param selected_clusters only do subclustering on these clusters #' @param hvg_param parameters for calculateHVG #' @param hvg_min_perc_cells threshold for detection in min percentage of cells -#' @param hvg_mean_expr_det threshold for mean expression level in cells with detection -#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as input for PCA -#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as input for PCA +#' @param hvg_mean_expr_det threshold for mean expression level in cells with +#' detection +#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as +#' input for PCA +#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as +#' input for PCA #' @param pca_param parameters for runPCA #' @param nn_param parameters for parameters for createNearestNetwork #' @param k_neighbors number of k for createNearestNetwork @@ -2265,7 +2447,8 @@ doLeidenSubCluster = function(gobject, #' @param return_gobject boolean: return giotto object (default = TRUE) #' @param verbose verbose #' @return giotto object with new subclusters appended to cell metadata -#' @details This function performs subclustering using the Louvain algorithm on selected clusters. +#' @details This function performs subclustering using the Louvain algorithm on +#' selected clusters. #' The systematic steps are: #' \itemize{ #' \item{1. subset Giotto object} @@ -2274,75 +2457,75 @@ doLeidenSubCluster = function(gobject, #' \item{4. create nearest neighbouring network} #' \item{5. do Louvain clustering} #' } -#' @seealso \code{\link{.doLouvainCluster_multinet}} and \code{\link{.doLouvainCluster_community}} +#' @seealso \code{\link{.doLouvainCluster_multinet}} and +#' \code{\link{.doLouvainCluster_community}} #' @export -doLouvainSubCluster = function(gobject, - name = 'sub_louvain_clus', - version = c('community', 'multinet'), - cluster_column = NULL, - selected_clusters = NULL, - hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, expression_values = 'normalized'), - hvg_min_perc_cells = 5, - hvg_mean_expr_det = 1, - use_all_genes_as_hvg = FALSE, - min_nr_of_hvg = 5, - pca_param = list(expression_values = 'normalized', scale_unit = TRUE), - nn_param = list(dimensions_to_use = 1:20), - k_neighbors = 10, - resolution = 0.5, - gamma = 1, - omega = 1, - python_path = NULL, - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - return_gobject = TRUE, - verbose = TRUE) { - - ## louvain clustering version to use - version = match.arg(version, c('community', 'multinet')) - - - # python community implementation - if(version == 'community') { - - result = .doLouvainSubCluster_community(gobject = gobject, - cluster_column = cluster_column, - selected_clusters = selected_clusters, - hvg_param = hvg_param, - hvg_mean_expr_det = hvg_mean_expr_det, - pca_param = pca_param, - nn_param = nn_param, - k_neighbors = k_neighbors, - resolution = resolution, - python_path = python_path, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - name = name, - return_gobject = return_gobject, - verbose = verbose) - - } else if(version == 'multinet') { - - result = .doLouvainSubCluster_multinet(gobject = gobject, - cluster_column = cluster_column, - selected_clusters = selected_clusters, - hvg_param = hvg_param, - hvg_mean_expr_det = hvg_mean_expr_det, - pca_param = pca_param, - nn_param = nn_param, - k_neighbors = k_neighbors, - gamma = gamma, - omega = omega, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - name = name, - return_gobject = return_gobject, - verbose = verbose) - - } - - return(result) +doLouvainSubCluster <- function(gobject, + name = "sub_louvain_clus", + version = c("community", "multinet"), + cluster_column = NULL, + selected_clusters = NULL, + hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, + expression_values = "normalized"), + hvg_min_perc_cells = 5, + hvg_mean_expr_det = 1, + use_all_genes_as_hvg = FALSE, + min_nr_of_hvg = 5, + pca_param = list(expression_values = "normalized", scale_unit = TRUE), + nn_param = list(dimensions_to_use = 1:20), + k_neighbors = 10, + resolution = 0.5, + gamma = 1, + omega = 1, + python_path = NULL, + nn_network_to_use = "sNN", + network_name = "sNN.pca", + return_gobject = TRUE, + verbose = TRUE) { + ## louvain clustering version to use + version <- match.arg(version, c("community", "multinet")) + + + # python community implementation + if (version == "community") { + result <- .doLouvainSubCluster_community( + gobject = gobject, + cluster_column = cluster_column, + selected_clusters = selected_clusters, + hvg_param = hvg_param, + hvg_mean_expr_det = hvg_mean_expr_det, + pca_param = pca_param, + nn_param = nn_param, + k_neighbors = k_neighbors, + resolution = resolution, + python_path = python_path, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + name = name, + return_gobject = return_gobject, + verbose = verbose + ) + } else if (version == "multinet") { + result <- .doLouvainSubCluster_multinet( + gobject = gobject, + cluster_column = cluster_column, + selected_clusters = selected_clusters, + hvg_param = hvg_param, + hvg_mean_expr_det = hvg_mean_expr_det, + pca_param = pca_param, + nn_param = nn_param, + k_neighbors = k_neighbors, + gamma = gamma, + omega = omega, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + name = name, + return_gobject = return_gobject, + verbose = verbose + ) + } + return(result) } @@ -2359,9 +2542,12 @@ doLouvainSubCluster = function(gobject, #' @param selected_clusters only do subclustering on these clusters #' @param hvg_param parameters for calculateHVG #' @param hvg_min_perc_cells threshold for detection in min percentage of cells -#' @param hvg_mean_expr_det threshold for mean expression level in cells with detection -#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as input for PCA -#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as input for PCA +#' @param hvg_mean_expr_det threshold for mean expression level in cells with +#' detection +#' @param use_all_genes_as_hvg forces all genes to be HVG and to be used as +#' input for PCA +#' @param min_nr_of_hvg minimum number of HVG, or all genes will be used as +#' input for PCA #' @param pca_param parameters for runPCA #' @param nn_param parameters for parameters for createNearestNetwork #' @param k_neighbors number of k for createNearestNetwork @@ -2384,108 +2570,112 @@ doLouvainSubCluster = function(gobject, #' \item{4. create nearest neighbouring network} #' \item{5. do clustering} #' } -#' @seealso \code{\link{.doLouvainCluster_multinet}}, \code{\link{.doLouvainCluster_community}} +#' @seealso \code{\link{.doLouvainCluster_multinet}}, +#' \code{\link{.doLouvainCluster_community}} #' and @seealso \code{\link{doLeidenCluster}} #' @export subClusterCells <- function(gobject, - name = 'sub_clus', - cluster_method = c('leiden', - 'louvain_community', - 'louvain_multinet'), - cluster_column = NULL, - selected_clusters = NULL, - hvg_param = list(reverse_log_scale = T, difference_in_cov = 1, expression_values = 'normalized'), - hvg_min_perc_cells = 5, - hvg_mean_expr_det = 1, - use_all_genes_as_hvg = FALSE, - min_nr_of_hvg = 5, - pca_param = list(expression_values = 'normalized', scale_unit = T), - nn_param = list(dimensions_to_use = 1:20), - k_neighbors = 10, - resolution = 1, - n_iterations = 1000, - gamma = 1, - omega = 1, - python_path = NULL, - nn_network_to_use = 'sNN', - network_name = 'sNN.pca', - return_gobject = TRUE, - verbose = T) { - - ## select cluster method - cluster_method = match.arg(arg = cluster_method, choices = c('leiden', - 'louvain_community', - 'louvain_multinet')) - - - if(cluster_method == 'leiden') { - - result = doLeidenSubCluster(gobject = gobject, - cluster_column = cluster_column, - selected_clusters = selected_clusters, - hvg_param = hvg_param, - hvg_min_perc_cells = hvg_min_perc_cells, - hvg_mean_expr_det = hvg_mean_expr_det, - use_all_genes_as_hvg = use_all_genes_as_hvg, - min_nr_of_hvg = min_nr_of_hvg, - pca_param = pca_param, - nn_param = nn_param, - k_neighbors = k_neighbors, - resolution = resolution, - n_iterations = n_iterations, - python_path = python_path, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - name = name, - return_gobject = return_gobject, - verbose = verbose) - - } else if(cluster_method == 'louvain_community') { - - result = .doLouvainSubCluster_community(gobject = gobject, - cluster_column = cluster_column, - selected_clusters = selected_clusters, - hvg_param = hvg_param, - hvg_min_perc_cells = hvg_min_perc_cells, - hvg_mean_expr_det = hvg_mean_expr_det, - use_all_genes_as_hvg = use_all_genes_as_hvg, - min_nr_of_hvg = min_nr_of_hvg, - pca_param = pca_param, - nn_param = nn_param, - k_neighbors = k_neighbors, - resolution = resolution, - python_path = python_path, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - name = name, - return_gobject = return_gobject, - verbose = verbose) - - } else if(cluster_method == 'louvain_multinet') { - - result = .doLouvainSubCluster_multinet(gobject = gobject, - cluster_column = cluster_column, - selected_clusters = selected_clusters, - hvg_param = hvg_param, - hvg_min_perc_cells = hvg_min_perc_cells, - hvg_mean_expr_det = hvg_mean_expr_det, - use_all_genes_as_hvg = use_all_genes_as_hvg, - min_nr_of_hvg = min_nr_of_hvg, - pca_param = pca_param, - nn_param = nn_param, - k_neighbors = k_neighbors, - gamma = gamma, - omega = omega, - nn_network_to_use = nn_network_to_use, - network_name = network_name, - name = name, - return_gobject = return_gobject, - verbose = verbose) - - } - - return(result) + name = "sub_clus", + cluster_method = c( + "leiden", + "louvain_community", + "louvain_multinet" + ), + cluster_column = NULL, + selected_clusters = NULL, + hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, + expression_values = "normalized"), + hvg_min_perc_cells = 5, + hvg_mean_expr_det = 1, + use_all_genes_as_hvg = FALSE, + min_nr_of_hvg = 5, + pca_param = list(expression_values = "normalized", scale_unit = TRUE), + nn_param = list(dimensions_to_use = 1:20), + k_neighbors = 10, + resolution = 1, + n_iterations = 1000, + gamma = 1, + omega = 1, + python_path = NULL, + nn_network_to_use = "sNN", + network_name = "sNN.pca", + return_gobject = TRUE, + verbose = TRUE) { + ## select cluster method + cluster_method <- match.arg(arg = cluster_method, choices = c( + "leiden", + "louvain_community", + "louvain_multinet" + )) + + + if (cluster_method == "leiden") { + result <- doLeidenSubCluster( + gobject = gobject, + cluster_column = cluster_column, + selected_clusters = selected_clusters, + hvg_param = hvg_param, + hvg_min_perc_cells = hvg_min_perc_cells, + hvg_mean_expr_det = hvg_mean_expr_det, + use_all_genes_as_hvg = use_all_genes_as_hvg, + min_nr_of_hvg = min_nr_of_hvg, + pca_param = pca_param, + nn_param = nn_param, + k_neighbors = k_neighbors, + resolution = resolution, + n_iterations = n_iterations, + python_path = python_path, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + name = name, + return_gobject = return_gobject, + verbose = verbose + ) + } else if (cluster_method == "louvain_community") { + result <- .doLouvainSubCluster_community( + gobject = gobject, + cluster_column = cluster_column, + selected_clusters = selected_clusters, + hvg_param = hvg_param, + hvg_min_perc_cells = hvg_min_perc_cells, + hvg_mean_expr_det = hvg_mean_expr_det, + use_all_genes_as_hvg = use_all_genes_as_hvg, + min_nr_of_hvg = min_nr_of_hvg, + pca_param = pca_param, + nn_param = nn_param, + k_neighbors = k_neighbors, + resolution = resolution, + python_path = python_path, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + name = name, + return_gobject = return_gobject, + verbose = verbose + ) + } else if (cluster_method == "louvain_multinet") { + result <- .doLouvainSubCluster_multinet( + gobject = gobject, + cluster_column = cluster_column, + selected_clusters = selected_clusters, + hvg_param = hvg_param, + hvg_min_perc_cells = hvg_min_perc_cells, + hvg_mean_expr_det = hvg_mean_expr_det, + use_all_genes_as_hvg = use_all_genes_as_hvg, + min_nr_of_hvg = min_nr_of_hvg, + pca_param = pca_param, + nn_param = nn_param, + k_neighbors = k_neighbors, + gamma = gamma, + omega = omega, + nn_network_to_use = nn_network_to_use, + network_name = network_name, + name = name, + return_gobject = return_gobject, + verbose = verbose + ) + } + return(result) } @@ -2497,7 +2687,8 @@ subClusterCells <- function(gobject, #' @title getClusterSimilarity #' @name getClusterSimilarity -#' @description Creates data.table with pairwise correlation scores between each cluster. +#' @description Creates data.table with pairwise correlation scores between +#' each cluster. #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -2505,69 +2696,81 @@ subClusterCells <- function(gobject, #' @param cluster_column name of column to use for clusters #' @param cor correlation score to calculate distance #' @return data.table -#' @details Creates data.table with pairwise correlation scores between each cluster and -#' the group size (# of cells) for each cluster. This information can be used together -#' with mergeClusters to combine very similar or small clusters into bigger clusters. +#' @details Creates data.table with pairwise correlation scores between each +#' cluster and the group size (# of cells) for each cluster. This information +#' can be used together with mergeClusters to combine very similar or small +#' clusters into bigger clusters. #' @export getClusterSimilarity <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - cor = c('pearson', 'spearman')) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # data.table variables - group1 = group2 = unified_group = value = NULL - - cor = match.arg(cor, c('pearson', 'spearman')) - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - metadata = pDataDT(gobject, - feat_type = feat_type, - spat_unit = spat_unit) - - # get clustersize - clustersize = metadata[, .N, by = cluster_column] - colnames(clustersize) = c('clusters', 'size') - - # data.table variables - clusters = NULL + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + cor = c("pearson", "spearman")) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - clustersize[, clusters := as.character(clusters)] + # data.table variables + group1 <- group2 <- unified_group <- value <- NULL - # scores per cluster - metatable = calculateMetaTable(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expression_values = values, - metadata_cols = cluster_column) - dcast_metatable = data.table::dcast.data.table(metatable, formula = variable~uniq_ID, value.var = 'value') - testmatrix = dt_to_matrix(x = dcast_metatable) + cor <- match.arg(cor, c("pearson", "spearman")) + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) - # correlation matrix - cormatrix = cor_flex(x = testmatrix, method = cor) - cor_table = data.table::as.data.table(reshape2::melt(cormatrix)) - data.table::setnames(cor_table, old = c('Var1', 'Var2'), c('group1', 'group2')) - cor_table[, c('group1', 'group2') := list(as.character(group1), as.character(group2))] - cor_table[, unified_group := paste(sort(c(group1, group2)), collapse = '--'), by = 1:nrow(cor_table)] - cor_table = cor_table[!duplicated(cor_table[, .(value, unified_group)])] + metadata <- pDataDT(gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) - cor_table = merge(cor_table, by.x = 'group1', clustersize, by.y = 'clusters') - setnames(cor_table, 'size', 'group1_size') - cor_table = merge(cor_table, by.x = 'group2', clustersize, by.y = 'clusters') - setnames(cor_table, 'size', 'group2_size') + # get clustersize + clustersize <- metadata[, .N, by = cluster_column] + colnames(clustersize) <- c("clusters", "size") - return(cor_table) + # data.table variables + clusters <- NULL + clustersize[, clusters := as.character(clusters)] + # scores per cluster + metatable <- calculateMetaTable( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + metadata_cols = cluster_column + ) + dcast_metatable <- data.table::dcast.data.table( + metatable, formula = variable ~ uniq_ID, value.var = "value") + testmatrix <- dt_to_matrix(x = dcast_metatable) + + # correlation matrix + cormatrix <- cor_flex(x = testmatrix, method = cor) + cor_table <- data.table::as.data.table(reshape2::melt(cormatrix)) + data.table::setnames( + cor_table, old = c("Var1", "Var2"), c("group1", "group2")) + cor_table[, c("group1", "group2") := list( + as.character(group1), as.character(group2))] + cor_table[, unified_group := paste( + sort(c(group1, group2)), collapse = "--"), by = 1:nrow(cor_table)] + cor_table <- cor_table[!duplicated(cor_table[, .(value, unified_group)])] + + cor_table <- merge( + cor_table, by.x = "group1", clustersize, by.y = "clusters") + setnames(cor_table, "size", "group1_size") + cor_table <- merge( + cor_table, by.x = "group2", clustersize, by.y = "clusters") + setnames(cor_table, "size", "group2_size") + + return(cor_table) } @@ -2575,7 +2778,8 @@ getClusterSimilarity <- function(gobject, #' @title mergeClusters #' @name mergeClusters -#' @description Merge selected clusters based on pairwise correlation scores and size of cluster. +#' @description Merge selected clusters based on pairwise correlation scores +#' and size of cluster. #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -2585,158 +2789,168 @@ getClusterSimilarity <- function(gobject, #' @param new_cluster_name new name for merged clusters #' @param min_cor_score min correlation score to merge pairwise clusters #' @param max_group_size max cluster size that can be merged -#' @param force_min_group_size size of clusters that will be merged with their most similar neighbor(s) -#' @param max_sim_clusters maximum number of clusters to potentially merge to reach force_min_group_size +#' @param force_min_group_size size of clusters that will be merged with their +#' most similar neighbor(s) +#' @param max_sim_clusters maximum number of clusters to potentially merge to +#' reach force_min_group_size #' @param return_gobject return giotto object #' @param verbose be verbose #' @return Giotto object -#' @details Merge selected clusters based on pairwise correlation scores and size of cluster. -#' To avoid large clusters to merge the max_group_size can be lowered. Small clusters can -#' be forcibly merged with their most similar pairwise cluster by adjusting the -#' force_min_group_size parameter. Clusters smaller than this value will be merged -#' independent on the provided min_cor_score value. The force_min_group_size might not always -#' be reached if clusters have already been merged before \cr -#' A giotto object is returned by default, if FALSE then the merging vector will be returned. +#' @details Merge selected clusters based on pairwise correlation scores and +#' size of cluster. +#' To avoid large clusters to merge the max_group_size can be lowered. Small +#' clusters can be forcibly merged with their most similar pairwise cluster by +#' adjusting the force_min_group_size parameter. Clusters smaller than this +#' value will be merged independent on the provided min_cor_score value. The +#' force_min_group_size might not always be reached if clusters have already +#' been merged before \cr +#' A giotto object is returned by default, if FALSE then the merging vector +#' will be returned. #' @export mergeClusters <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - cor = c('pearson', 'spearman'), - new_cluster_name = 'merged_cluster', - min_cor_score = 0.8, - max_group_size = 20, - force_min_group_size = 10, - max_sim_clusters = 10, - return_gobject = TRUE, - verbose = TRUE) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - # correlation score to be used - cor = match.arg(cor, c('pearson', 'spearman')) - - # calculate similarity data.table - similarityDT = getClusterSimilarity(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expression_values = values, - cluster_column = cluster_column, - cor = cor) - - ## get clusters that can be merged - # 1. clusters with high correlation - - # data.table variables - group1 = group2 = group1_size = value = cumsum_val = group2_size = min_reached = cumsum_reached = NULL - - filter_set_first = similarityDT[group1 != group2][group1_size < max_group_size][value >= min_cor_score] - - # 2. small clusters - minimum_set = similarityDT[group1 != group2][group1_size < force_min_group_size][order(-value)][, head(.SD, max_sim_clusters), by = group1] - - # 2.1 take all clusters necessary to reach force_min_group_size - minimum_set[, cumsum_val := cumsum(group2_size) + group1_size, by = group1] - minimum_set[, min_reached := ifelse(cumsum_val > max_group_size, 1, 0)] - minimum_set[, cumsum_reached := cumsum(min_reached), by = group1] - minimum_set = minimum_set[cumsum_reached <= 1] - minimum_set[, c('cumsum_val', 'min_reached', 'cumsum_reached') := NULL] - - filter_set = unique(do.call('rbind', list(filter_set_first, minimum_set))) - - ## get list of correlated groups - finallist = list() - start_i = 1 - for(row in 1:nrow(filter_set)) { - - first_clus = filter_set[row][['group1']] - second_clus = filter_set[row][['group2']] - - res = lapply(finallist, function(x) {any(x %in% c(first_clus, second_clus))}) - - if(all(res == F)) { - - - finallist[[start_i]] = c(first_clus, second_clus) - start_i = start_i + 1 - - } else if(length(res[res == T]) == 2) { - - - NULL + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + cor = c("pearson", "spearman"), + new_cluster_name = "merged_cluster", + min_cor_score = 0.8, + max_group_size = 20, + force_min_group_size = 10, + max_sim_clusters = 10, + return_gobject = TRUE, + verbose = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } else { + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + # correlation score to be used + cor <- match.arg(cor, c("pearson", "spearman")) - who = which(res == TRUE)[[1]] - finallist[[who]] = unique(c(finallist[[who]], first_clus, second_clus)) + # calculate similarity data.table + similarityDT <- getClusterSimilarity( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + cluster_column = cluster_column, + cor = cor + ) + ## get clusters that can be merged + # 1. clusters with high correlation + + # data.table variables + group1 <- group2 <- group1_size <- value <- cumsum_val <- group2_size <- + min_reached <- cumsum_reached <- NULL + + filter_set_first <- similarityDT[group1 != group2][ + group1_size < max_group_size][value >= min_cor_score] + + # 2. small clusters + minimum_set <- similarityDT[group1 != group2][ + group1_size < force_min_group_size][order(-value)][ + , head(.SD, max_sim_clusters), by = group1] + + # 2.1 take all clusters necessary to reach force_min_group_size + minimum_set[, cumsum_val := cumsum(group2_size) + group1_size, by = group1] + minimum_set[, min_reached := ifelse(cumsum_val > max_group_size, 1, 0)] + minimum_set[, cumsum_reached := cumsum(min_reached), by = group1] + minimum_set <- minimum_set[cumsum_reached <= 1] + minimum_set[, c("cumsum_val", "min_reached", "cumsum_reached") := NULL] + + filter_set <- unique(do.call("rbind", list(filter_set_first, minimum_set))) + + ## get list of correlated groups + finallist <- list() + start_i <- 1 + for (row in 1:nrow(filter_set)) { + first_clus <- filter_set[row][["group1"]] + second_clus <- filter_set[row][["group2"]] + + res <- lapply(finallist, function(x) { + any(x %in% c(first_clus, second_clus)) + }) + + if (all(res == FALSE)) { + finallist[[start_i]] <- c(first_clus, second_clus) + start_i <- start_i + 1 + } else if (length(res[res == TRUE]) == 2) { + NULL + } else { + who <- which(res == TRUE)[[1]] + finallist[[who]] <- unique( + c(finallist[[who]], first_clus, second_clus)) + } } - } + ## update metadata + metadata <- data.table::copy(pDataDT(gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) + + finalvec <- NULL + for (ll in seq_along(finallist)) { + tempvec <- finallist[[ll]] + names(tempvec) <- rep(paste0("m_", ll), length(tempvec)) + finalvec <- c(finalvec, tempvec) + } - ## update metadata - metadata = data.table::copy(pDataDT(gobject, - spat_unit = spat_unit, - feat_type = feat_type)) + metadata[, eval(new_cluster_name) := ifelse( + as.character(get(cluster_column)) %in% finalvec, + names(finalvec[finalvec == as.character(get(cluster_column))]), + as.character(get(cluster_column)) + ), by = 1:nrow(metadata)] - finalvec = NULL - for(ll in seq_along(finallist)) { - tempvec = finallist[[ll]]; names(tempvec) = rep(paste0('m_', ll), length(tempvec)) - finalvec = c(finalvec, tempvec) - } - metadata[, eval(new_cluster_name) := ifelse(as.character(get(cluster_column)) %in% finalvec, - names(finalvec[finalvec == as.character(get(cluster_column))]), - as.character(get(cluster_column))), by = 1:nrow(metadata)] + if (return_gobject == TRUE) { + cluster_names <- names(pDataDT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + )) - if(return_gobject == TRUE) { + if (new_cluster_name %in% cluster_names) { + cat(new_cluster_name, " has already been used, will be overwritten") + cell_metadata <- gobject@cell_metadata[[feat_type]][[spat_unit]] + cell_metadata[, eval(new_cluster_name) := NULL] + gobject@cell_metadata[[feat_type]][[spat_unit]] <- cell_metadata + } - cluster_names = names(pDataDT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type)) + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = metadata[ + , c("cell_ID", new_cluster_name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) - #cluster_names = names(gobject@cell_metadata[[feat_type]][[spat_unit]]) - if(new_cluster_name %in% cluster_names) { - cat('\n ', new_cluster_name, ' has already been used, will be overwritten \n') - cell_metadata = gobject@cell_metadata[[feat_type]][[spat_unit]] - cell_metadata[, eval(new_cluster_name) := NULL] - gobject@cell_metadata[[feat_type]][[spat_unit]] = cell_metadata + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_merge_cluster") + return(gobject) + } else { + return(list(mergevector = finalvec, metadata = metadata)) } - - gobject = addCellMetadata( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = metadata[, c('cell_ID', new_cluster_name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' - ) - - - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_merge_cluster') - return(gobject) - - } else { - - return(list(mergevector = finalvec, metadata = metadata)) - - } } @@ -2745,102 +2959,104 @@ mergeClusters <- function(gobject, #' @title Split dendrogram in two #' @name .split_dendrogram_in_two -#' @description Merge selected clusters based on pairwise correlation scores and size of cluster. +#' @description Merge selected clusters based on pairwise correlation scores +#' and size of cluster. #' @param dend dendrogram object #' @return list of two dendrograms and height of node #' @keywords internal -.split_dendrogram_in_two = function(dend) { - - top_height = attributes(dend)$height - divided_leaves_labels = dendextend::cut_lower_fun(dend, h = top_height) - - # this works for both numericala nd character leave names - all_leaves = dendextend::get_leaves_attr(dend = dend, attribute = 'label') - selected_labels_ind_1 = all_leaves %in% divided_leaves_labels[[1]] - selected_labels_ind_2 = all_leaves %in% divided_leaves_labels[[2]] - numerical_leaves = unlist(dend) - names(numerical_leaves) = all_leaves - - dend_1 = dendextend::find_dendrogram(dend = dend, selected_labels = names(numerical_leaves[selected_labels_ind_1])) - dend_2 = dendextend::find_dendrogram(dend = dend, selected_labels = names(numerical_leaves[selected_labels_ind_2])) - - #dend_1 = dendextend::find_dendrogram(dend = dend, selected_labels = divided_leaves_labels[[1]]) - #dend_2 = dendextend::find_dendrogram(dend = dend, selected_labels = divided_leaves_labels[[2]]) - - return(list(theight = top_height, dend1 = dend_1, dend2 = dend_2)) +.split_dendrogram_in_two <- function(dend) { + top_height <- attributes(dend)$height + divided_leaves_labels <- dendextend::cut_lower_fun(dend, h = top_height) + + # this works for both numericala nd character leave names + all_leaves <- dendextend::get_leaves_attr(dend = dend, attribute = "label") + selected_labels_ind_1 <- all_leaves %in% divided_leaves_labels[[1]] + selected_labels_ind_2 <- all_leaves %in% divided_leaves_labels[[2]] + numerical_leaves <- unlist(dend) + names(numerical_leaves) <- all_leaves + + dend_1 <- dendextend::find_dendrogram( + dend = dend, + selected_labels = names(numerical_leaves[selected_labels_ind_1])) + dend_2 <- dendextend::find_dendrogram( + dend = dend, + selected_labels = names(numerical_leaves[selected_labels_ind_2])) + + return(list(theight = top_height, dend1 = dend_1, dend2 = dend_2)) } #' @title Node clusters #' @name .node_clusters -#' @description Merge selected clusters based on pairwise correlation scores and size of cluster. +#' @description Merge selected clusters based on pairwise correlation scores +#' and size of cluster. #' @param hclus_obj hclus object #' @param verbose be verbose #' @return list of splitted dendrogram nodes from high to low node height #' @keywords internal -.node_clusters = function(hclus_obj, verbose = TRUE) { - - heights = sort(hclus_obj[['height']], decreasing = T) - mydend = stats::as.dendrogram(hclus_obj) - +.node_clusters <- function(hclus_obj, verbose = TRUE) { + heights <- sort(hclus_obj[["height"]], decreasing = TRUE) + mydend <- stats::as.dendrogram(hclus_obj) - result_list = list() - j = 1 - dend_list = list() - i = 1 - dend_list[[i]] = mydend + result_list <- list() + j <- 1 - ## create split at each height ## - for(n_height in heights) { + dend_list <- list() + i <- 1 + dend_list[[i]] <- mydend - if(verbose == TRUE) cat('height ', n_height, '\n') + ## create split at each height ## + for (n_height in heights) { + if (verbose == TRUE) cat("height ", n_height, "\n") - # only use dendrogram objects - ind = lapply(dend_list, FUN = function(x) class(x) == 'dendrogram') - dend_list = dend_list[unlist(ind)] + # only use dendrogram objects + ind <- lapply(dend_list, FUN = function(x) class(x) == "dendrogram") + dend_list <- dend_list[unlist(ind)] - # check which heights are available - available_h = as.numeric(unlist(lapply(dend_list, FUN = function(x) attributes(x)$height))) + # check which heights are available + available_h <- as.numeric(unlist(lapply( + dend_list, FUN = function(x) attributes(x)$height))) - # get dendrogram associated with height and split in two - # select_dend_ind = which(available_h == n_height) - select_dend_ind = which.min(abs(available_h - n_height)) - select_dend = dend_list[[select_dend_ind]] - tempres = .split_dendrogram_in_two(dend = select_dend) + # get dendrogram associated with height and split in two + select_dend_ind <- which.min(abs(available_h - n_height)) + select_dend <- dend_list[[select_dend_ind]] + tempres <- .split_dendrogram_in_two(dend = select_dend) - # find leave labels - toph = tempres[[1]] - first_group = dendextend::get_leaves_attr(tempres[[2]], attribute = 'label') - second_group = dendextend::get_leaves_attr(tempres[[3]], attribute = 'label') + # find leave labels + toph <- tempres[[1]] + first_group <- dendextend::get_leaves_attr( + tempres[[2]], attribute = "label") + second_group <- dendextend::get_leaves_attr( + tempres[[3]], attribute = "label") - result_list[[j]] = list('height' = toph, 'first' = first_group, 'sec' = second_group) - j = j+1 + result_list[[j]] <- list("height" = toph, + "first" = first_group, + "sec" = second_group) + j <- j + 1 - ## add dendrograms to list - ind = lapply(tempres, FUN = function(x) class(x) == 'dendrogram') - tempres_dend = tempres[unlist(ind)] + ## add dendrograms to list + ind <- lapply(tempres, FUN = function(x) class(x) == "dendrogram") + tempres_dend <- tempres[unlist(ind)] - dend_list[[i+1]] = tempres_dend[[1]] - dend_list[[i+2]] = tempres_dend[[2]] + dend_list[[i + 1]] <- tempres_dend[[1]] + dend_list[[i + 2]] <- tempres_dend[[2]] - i = i+2 - - - } - - return(list(dend_list, result_list)) + i <- i + 2 + } + return(list(dend_list, result_list)) } #' @title getDendrogramSplits #' @name getDendrogramSplits -#' @description Split dendrogram at each node and keep the leave (label) information.. +#' @description Split dendrogram at each node and keep the leave (label) +#' information. #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -2853,76 +3069,84 @@ mergeClusters <- function(gobject, #' @param show_dend show dendrogram #' @param verbose be verbose #' @return data.table object -#' @details Creates a data.table with three columns and each row represents a node in the -#' dendrogram. For each node the height of the node is given together with the two -#' subdendrograms. This information can be used to determine in a hierarchical manner -#' differentially expressed marker genes at each node. +#' @details Creates a data.table with three columns and each row represents a +#' node in the dendrogram. For each node the height of the node is given +#' together with the two subdendrograms. This information can be used to +#' determine in a hierarchical manner differentially expressed marker genes at +#' each node. #' @export -getDendrogramSplits = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - cor = c('pearson', 'spearman'), - distance = 'ward.D', - h = NULL, - h_color = 'red', - show_dend = TRUE, - verbose = TRUE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # package check for dendextend - package_check(pkg_name = "dendextend", repository = "CRAN") - - # data.table variables - nodeID = NULL - - cor = match.arg(cor, c('pearson', 'spearman')) - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - # create average expression matrix per cluster - metatable = calculateMetaTable(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expression_values = values, - metadata_cols = cluster_column) - dcast_metatable = data.table::dcast.data.table(metatable, formula = variable~uniq_ID, value.var = 'value') - testmatrix = dt_to_matrix(x = dcast_metatable) - - # correlation - cormatrix = cor_flex(x = testmatrix, method = cor) - cordist = stats::as.dist(1 - cormatrix, diag = T, upper = T) - corclus = stats::hclust(d = cordist, method = distance) - - cordend = stats::as.dendrogram(object = corclus) - - - if(show_dend == TRUE) { - # plot dendrogram - graphics::plot(cordend) - - # add horizontal lines - if(!is.null(h)) { - graphics::abline(h = h, col = h_color) - } - } +getDendrogramSplits <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + cor = c("pearson", "spearman"), + distance = "ward.D", + h = NULL, + h_color = "red", + show_dend = TRUE, + verbose = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # package check for dendextend + package_check(pkg_name = "dendextend", repository = "CRAN") + # data.table variables + nodeID <- NULL - splitList = .node_clusters(hclus_obj = corclus, verbose = verbose) + cor <- match.arg(cor, c("pearson", "spearman")) + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) - splitDT = data.table::as.data.table(t_flex(data.table::as.data.table(splitList[[2]]))) - colnames(splitDT) = c('node_h', 'tree_1', 'tree_2') - splitDT[, nodeID := paste0('node_', 1:.N)] + # create average expression matrix per cluster + metatable <- calculateMetaTable( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + metadata_cols = cluster_column + ) + dcast_metatable <- data.table::dcast.data.table( + metatable, formula = variable ~ uniq_ID, value.var = "value") + testmatrix <- dt_to_matrix(x = dcast_metatable) - return(splitDT) + # correlation + cormatrix <- cor_flex(x = testmatrix, method = cor) + cordist <- stats::as.dist(1 - cormatrix, diag = TRUE, upper = TRUE) + corclus <- stats::hclust(d = cordist, method = distance) + cordend <- stats::as.dendrogram(object = corclus) + + + if (show_dend == TRUE) { + # plot dendrogram + graphics::plot(cordend) + + # add horizontal lines + if (!is.null(h)) { + graphics::abline(h = h, col = h_color) + } + } + + + splitList <- .node_clusters(hclus_obj = corclus, verbose = verbose) + + splitDT <- data.table::as.data.table(t_flex( + data.table::as.data.table(splitList[[2]]))) + colnames(splitDT) <- c("node_h", "tree_1", "tree_2") + splitDT[, nodeID := paste0("node_", 1:.N)] + + return(splitDT) } @@ -2934,7 +3158,8 @@ getDendrogramSplits = function(gobject, #' @title Projection of cluster labels #' @name doClusterProjection -#' @description Use a fast KNN classifier to predict labels from a smaller giotto object +#' @description Use a fast KNN classifier to predict labels from a smaller +#' giotto object #' @param target_gobject target giotto object #' @param target_cluster_label_name name for predicted clusters #' @param spat_unit spatial unit @@ -2944,159 +3169,183 @@ getDendrogramSplits = function(gobject, #' @param reduction reduction on cells or features (default = cells) #' @param reduction_method shared reduction method (default = pca space) #' @param reduction_name name of shared reduction space (default name = 'pca') -#' @param dimensions_to_use dimensions to use in shared reduction space (default = 1:10) +#' @param dimensions_to_use dimensions to use in shared reduction space +#' (default = 1:10) #' @param knn_k number of k-neighbors to train a KNN classifier #' @param prob output probabilities together with label predictions #' @param algorithm nearest neighbor search algorithm #' @param return_gobject return giotto object #' @return giotto object (default) or data.table with cell metadata #' -#' @details Function to train a KNN with \code{\link[FNN]{knn}}. The training data -#' is obtained from the source giotto object (source_gobject) using existing annotations -#' within the cell metadata. Cells without annotation/labels from the target giotto -#' object (target_gobject) will receive predicted labels (and optional probabilities -#' with prob = TRUE). +#' @details Function to train a KNN with \code{\link[FNN]{knn}}. The training +#' data is obtained from the source giotto object (source_gobject) using +#' existing annotations within the cell metadata. Cells without +#' annotation/labels from the target giotto object (target_gobject) will +#' receive predicted labels (and optional probabilities with prob = TRUE). #' -#' **IMPORTANT** This projection assumes that you're using the same dimension reduction -#' space (e.g. PCA) and number of dimensions (e.g. first 10 PCs) to train the KNN -#' classifier as you used to create the initial annotations/labels in the source -#' Giotto object. +#' **IMPORTANT** This projection assumes that you're using the same dimension +#' reduction space (e.g. PCA) and number of dimensions (e.g. first 10 PCs) to +#' train the KNN classifier as you used to create the initial +#' annotations/labels in the source Giotto object. #' -#' Altogether this is a convenience function that allow you to work with very big -#' data as you can predict cell labels on a smaller & subsetted Giotto object and then -#' project the cell labels to the remaining cells in the target Giotto object. +#' Altogether this is a convenience function that allow you to work with very +#' big data as you can predict cell labels on a smaller & subsetted Giotto +#' object and then project the cell labels to the remaining cells in the target +#' Giotto object. #' @export -doClusterProjection = function(target_gobject, - target_cluster_label_name = 'knn_labels', - spat_unit = NULL, - feat_type = NULL, - source_gobject, - source_cluster_labels = NULL, - reduction = 'cells', - reduction_method = 'pca', - reduction_name = 'pca', - dimensions_to_use = 1:10, - knn_k = 10, - prob = FALSE, - algorithm = c("kd_tree", - "cover_tree", "brute"), - return_gobject = TRUE) { - - # NSE vars - cell_ID = temp_name_prob = NULL - - # package check for dendextend - package_check(pkg_name = "FNN", repository = "CRAN") - - spat_unit = set_default_spat_unit(gobject = target_gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = target_gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # identify clusters from source object and create annotation vector - cell_meta_source <- getCellMetadata(gobject = source_gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table') - source_annot_vec = cell_meta_source[[source_cluster_labels]] - names(source_annot_vec) = cell_meta_source[['cell_ID']] - - # create the matrix from the target object that you want to use for the kNN classifier - # the matrix should be the same for the source and target objects (e.g. same PCA space) - dim_obj = get_dimReduction(gobject = target_gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, - reduction_method = reduction_method, - name = reduction_name, - output = 'dimObj') - - dim_coord = dim_obj[] - dimensions_to_use = dimensions_to_use[dimensions_to_use %in% 1:ncol(dim_coord)] - matrix_to_use = dim_coord[, dimensions_to_use] - - ## create the training and testset from the matrix - - # the training set is the set of cell IDs that are in both the source (w/ labels) - # and target giotto object - train = matrix_to_use[rownames(matrix_to_use) %in% names(source_annot_vec),] - train = train[match(names(source_annot_vec), rownames(train)), ] - - # the test set are the remaining cell_IDs that need a label - test = matrix_to_use[!rownames(matrix_to_use) %in% names(source_annot_vec),] - cl = source_annot_vec - - # make prediction - knnprediction = FNN::knn(train = train, test = test, - cl = cl, k = knn_k, prob = prob, - algorithm = algorithm) - - # get prediction results - knnprediction_vec = as.vector(knnprediction) - names(knnprediction_vec) = rownames(test) - - # add probability information - if(isTRUE(prob)) { - probs = attr(knnprediction, "prob") - names(probs) = rownames(test) - } - - - - # create annotation vector for all cell IDs (from source and predicted) - all_vec = c(source_annot_vec, knnprediction_vec) - - cell_meta_target <- getCellMetadata(gobject = target_gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table') - # data.table variables - temp_name = NULL - cell_meta_target[, temp_name := all_vec[cell_ID]] - - if(isTRUE(prob)) { - - cell_meta_target[, temp_name_prob := probs[cell_ID]] - cell_meta_target = cell_meta_target[,.(cell_ID, temp_name, temp_name_prob)] - cell_meta_target[, temp_name_prob := ifelse(is.na(temp_name_prob), 1, temp_name_prob)] - - data.table::setnames(cell_meta_target, - old = c('temp_name', 'temp_name_prob'), - new = c(target_cluster_label_name, paste0(target_cluster_label_name,'_prob'))) - } else { - - cell_meta_target = cell_meta_target[,.(cell_ID, temp_name)] - data.table::setnames(cell_meta_target, - old = 'temp_name', - new = target_cluster_label_name) - } - - - if(return_gobject) { - - if(isTRUE(prob)) { - - prob_label = paste0(target_cluster_label_name,'_prob') - - target_gobject = addCellMetadata(gobject = target_gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = cell_meta_target[, c('cell_ID', target_cluster_label_name, prob_label), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID') +doClusterProjection <- function(target_gobject, + target_cluster_label_name = "knn_labels", + spat_unit = NULL, + feat_type = NULL, + source_gobject, + source_cluster_labels = NULL, + reduction = "cells", + reduction_method = "pca", + reduction_name = "pca", + dimensions_to_use = 1:10, + knn_k = 10, + prob = FALSE, + algorithm = c( + "kd_tree", + "cover_tree", "brute" + ), + return_gobject = TRUE) { + # NSE vars + cell_ID <- temp_name_prob <- NULL + + # package check for dendextend + package_check(pkg_name = "FNN", repository = "CRAN") + + spat_unit <- set_default_spat_unit( + gobject = target_gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = target_gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # identify clusters from source object and create annotation vector + cell_meta_source <- getCellMetadata( + gobject = source_gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table" + ) + source_annot_vec <- cell_meta_source[[source_cluster_labels]] + names(source_annot_vec) <- cell_meta_source[["cell_ID"]] + + # create the matrix from the target object that you want to use for the kNN classifier + # the matrix should be the same for the source and target objects (e.g. same PCA space) + dim_obj <- getDimReduction( + gobject = target_gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, + reduction_method = reduction_method, + name = reduction_name, + output = "dimObj" + ) + + dim_coord <- dim_obj[] + dimensions_to_use <- dimensions_to_use[ + dimensions_to_use %in% 1:ncol(dim_coord)] + matrix_to_use <- dim_coord[, dimensions_to_use] + + ## create the training and testset from the matrix + + # the training set is the set of cell IDs that are in both the source + # (w/ labels) + # and target giotto object + train <- matrix_to_use[ + rownames(matrix_to_use) %in% names(source_annot_vec), ] + train <- train[match(names(source_annot_vec), rownames(train)), ] + + # the test set are the remaining cell_IDs that need a label + test <- matrix_to_use[ + !rownames(matrix_to_use) %in% names(source_annot_vec), ] + cl <- source_annot_vec + + # make prediction + knnprediction <- FNN::knn( + train = train, test = test, + cl = cl, k = knn_k, prob = prob, + algorithm = algorithm + ) + + # get prediction results + knnprediction_vec <- as.vector(knnprediction) + names(knnprediction_vec) <- rownames(test) + + # add probability information + if (isTRUE(prob)) { + probs <- attr(knnprediction, "prob") + names(probs) <- rownames(test) + } + + + + # create annotation vector for all cell IDs (from source and predicted) + all_vec <- c(source_annot_vec, knnprediction_vec) + + cell_meta_target <- getCellMetadata( + gobject = target_gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table" + ) + # data.table variables + temp_name <- NULL + cell_meta_target[, temp_name := all_vec[cell_ID]] + + if (isTRUE(prob)) { + cell_meta_target[, temp_name_prob := probs[cell_ID]] + cell_meta_target <- cell_meta_target[ + , .(cell_ID, temp_name, temp_name_prob)] + cell_meta_target[, temp_name_prob := ifelse( + is.na(temp_name_prob), 1, temp_name_prob)] + + data.table::setnames(cell_meta_target, + old = c("temp_name", "temp_name_prob"), + new = c(target_cluster_label_name, + paste0(target_cluster_label_name, "_prob")) + ) } else { - target_gobject = addCellMetadata(gobject = target_gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = cell_meta_target[, c('cell_ID', target_cluster_label_name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID') + cell_meta_target <- cell_meta_target[, .(cell_ID, temp_name)] + data.table::setnames(cell_meta_target, + old = "temp_name", + new = target_cluster_label_name + ) } - } else { - return(cell_meta_target) - } + if (return_gobject) { + if (isTRUE(prob)) { + prob_label <- paste0(target_cluster_label_name, "_prob") + + target_gobject <- addCellMetadata( + gobject = target_gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = cell_meta_target[ + , c("cell_ID", target_cluster_label_name, prob_label), + with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + } else { + target_gobject <- addCellMetadata( + gobject = target_gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = cell_meta_target[ + , c("cell_ID", target_cluster_label_name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + } + } else { + return(cell_meta_target) + } } - diff --git a/R/convenience.R b/R/convenience.R index 3cc7a7b7d..3bbf7ea4b 100644 --- a/R/convenience.R +++ b/R/convenience.R @@ -1,5 +1,4 @@ - -# ** Spatial Method-Specific Convenience Functions for Giotto Object Creation ** # +# Spatial Method-Specific Convenience Functions for Giotto Object Creation # @@ -14,12 +13,13 @@ #' reader functions should be built using it as a base. #' @param spat_method spatial method for which the data is being read #' @param data_dir exported data directory to read from -#' @param dir_items named list of directory items to expect and keywords to match +#' @param dir_items named list of directory items to expect and keywords to +#' match #' @param data_to_use character. Which type(s) of expression data to build the #' gobject with. Values should match with a *workflow* item in require_data_DT #' (see details) -#' @param require_data_DT data.table detailing if expected data items are required -#' or optional for each \code{data_to_use} *workflow* +#' @param require_data_DT data.table detailing if expected data items are +#' required or optional for each \code{data_to_use} *workflow* #' @param cores cores to use #' @param verbose be verbose #' @param toplevel stackframes back where the user-facing function was called. @@ -30,11 +30,11 @@ #' \item{1. detection of items within \code{data_dir} by looking for keywords #' assigned through \code{dir_items}} #' \item{2. check of detected items to see if everything needed has been found. -#' Dictionary of necessary vs optional items for each \code{data_to_use} *workflow* -#' is provided through \code{require_data_DT}} -#' \item{3. if multiple filepaths are found to be matching then select the first -#' one. This function is only intended to find the first level subdirectories -#' and files.} +#' Dictionary of necessary vs optional items for each \code{data_to_use} +#' *workflow* is provided through \code{require_data_DT}} +#' \item{3. if multiple filepaths are found to be matching then select the +#' first one. This function is only intended to find the first level +#' subdirectories and files.} #' } #' #' **Example reader implementation:** @@ -82,93 +82,95 @@ NULL #' @describeIn read_data_folder Should not be used directly #' @keywords internal .read_data_folder <- function(spat_method = NULL, - data_dir = NULL, - dir_items, - data_to_use, - load_format = NULL, - require_data_DT, - cores = NA, - verbose = NULL, - toplevel = 2L) { - - ch = box_chars() - - # 0. check params - if(is.null(data_dir) || - !dir.exists(data_dir)) { - .gstop(.n = toplevel, 'The full path to a', spat_method, 'directory must be given.') - } - vmsg(.v = verbose, 'A structured', spat_method, 'directory will be used') - if(!data_to_use %in% require_data_DT$workflow) { - .gstop(.n = toplevel, 'Data requirements for data_to_use not found in require_data_DT') - } - - # 1. detect items - dir_items = lapply_flex(dir_items, function(x) { - Sys.glob(paths = file.path(data_dir, x)) - }, cores = cores) - # (length = 1 if present, length = 0 if missing) - dir_items_lengths = lengths(dir_items) - - # 2. check directory contents - vmsg(.v = verbose, 'Checking directory contents...') - - for(item in names(dir_items)) { - - # IF ITEM FOUND - - if(dir_items_lengths[[item]] > 0) { - # print found items if verbose = "debug" - if(isTRUE(verbose)) { - vmsg( - .v = verbose, .is_debug = TRUE, - .initial = paste0(ch$s, '> '), - item, ' found' - ) - for(item_i in seq_along(dir_items[[item]])) { # print found item names - subItem = gsub(pattern = '.*/', replacement = '', x = dir_items[[item]][[item_i]]) - vmsg( - .v = verbose, .is_debug = TRUE, - .initial = paste0(ch$s, ch$s, ch$l, ch$h, ch$h), - subItem - ) - } - } - } else { - - # IF ITEM MISSING - # necessary (error) - # optional (warning) - - # data.table variables - workflow = needed = filetype = NULL - + data_dir = NULL, + dir_items, + data_to_use, + load_format = NULL, + require_data_DT, + cores = NA, + verbose = NULL, + toplevel = 2L) { + ch <- box_chars() + + # 0. check params + if (is.null(data_dir) || + !dir.exists(data_dir)) { + .gstop(.n = toplevel, "The full path to a", spat_method, + "directory must be given.") + } + vmsg(.v = verbose, "A structured", spat_method, "directory will be used") + if (!data_to_use %in% require_data_DT$workflow) { + .gstop(.n = toplevel, + "Data requirements for data_to_use not found in require_data_DT") + } - require_data_DT = require_data_DT[workflow == data_to_use,] - if(!is.null(load_format)) require_data_DT = require_data_DT[filetype == load_format,] + # 1. detect items + dir_items <- lapply_flex(dir_items, function(x) { + Sys.glob(paths = file.path(data_dir, x)) + }, cores = cores) + # (length = 1 if present, length = 0 if missing) + dir_items_lengths <- lengths(dir_items) + + # 2. check directory contents + vmsg(.v = verbose, "Checking directory contents...") + + for (item in names(dir_items)) { + # IF ITEM FOUND + + if (dir_items_lengths[[item]] > 0) { + # print found items if verbose = "debug" + if (isTRUE(verbose)) { + vmsg( + .v = verbose, .is_debug = TRUE, + .initial = paste0(ch$s, "> "), + item, " found" + ) + for (item_i in seq_along(dir_items[[item]])) { + # print found item names + subItem <- gsub(pattern = ".*/", replacement = "", + x = dir_items[[item]][[item_i]]) + vmsg( + .v = verbose, .is_debug = TRUE, + .initial = paste0(ch$s, ch$s, ch$l, ch$h, ch$h), + subItem + ) + } + } + } else { + # IF ITEM MISSING + # necessary (error) + # optional (warning) + + # data.table variables + workflow <- needed <- filetype <- NULL + + + require_data_DT <- require_data_DT[workflow == data_to_use, ] + if (!is.null(load_format)) + require_data_DT <- require_data_DT[filetype == load_format, ] + + if (item %in% require_data_DT[needed == TRUE, item]) + stop(item, " is missing") + if (item %in% require_data_DT[needed == FALSE, item]) + warning(item, "is missing (optional)") + } + } - if(item %in% require_data_DT[needed == TRUE, item]) stop(item, ' is missing\n') - if(item %in% require_data_DT[needed == FALSE, item]) warning(item, 'is missing (optional)\n') + # 3. select first path in list if multiple are detected + if (any(dir_items_lengths > 1)) { + warning(wrap_txt("Multiple matches for expected directory item(s). + First matching item selected")) + multiples <- which(dir_items_lengths > 1) + for (mult_i in multiples) { + message(names(dir_items)[[mult_i]], "multiple matches found:") + print(dir_items[[mult_i]]) + dir_items[[mult_i]] <- dir_items[[mult_i]][[1]] + } } - } - - # 3. select first path in list if multiple are detected - if(any(dir_items_lengths > 1)) { - warning(wrap_txt('Multiple matches for expected directory item(s). - First matching item selected')) - - multiples = which(dir_items_lengths > 1) - for(mult_i in multiples) { - message(names(dir_items)[[mult_i]], 'multiple matches found:') - print(dir_items[[mult_i]]) - dir_items[[mult_i]] = dir_items[[mult_i]][[1]] - } - } - vmsg(.v = verbose, 'Directory check done') - - return(dir_items) + vmsg(.v = verbose, "Directory check done") + return(dir_items) } @@ -190,7 +192,8 @@ NULL #' @title Create a giotto object from 10x visium data #' @name createGiottoVisiumObject -#' @description Create Giotto object directly from a 10X visium folder. Also accepts visium H5 outputs. +#' @description Create Giotto object directly from a 10X visium folder. Also +#' accepts visium H5 outputs. #' #' @param visium_dir path to the 10X visium directory [required] #' @param expr_data raw or filtered data (see details) @@ -198,8 +201,8 @@ NULL #' @param h5_visium_path path to visium 10X .h5 file #' @param h5_gene_ids gene names as symbols (default) or ensemble gene ids #' @param h5_tissue_positions_path path to tissue locations (.csv file) -#' @param h5_image_png_path path to tissue .png file (optional). Image autoscaling -#' looks for matches in the filename for either 'hires' or 'lowres' +#' @param h5_image_png_path path to tissue .png file (optional). Image +#' autoscaling looks for matches in the filename for either 'hires' or 'lowres' #' @param h5_json_scalefactors_path path to .json scalefactors (optional) #' @param png_name select name of png to use (see details) #' @param do_manual_adj deprecated @@ -207,9 +210,12 @@ NULL #' @param xmin_adj deprecated #' @param ymax_adj deprecated #' @param ymin_adj deprecated -#' @param instructions list of instructions or output result from \code{\link[GiottoClass]{createGiottoInstructions}} -#' @param cores how many cores or threads to use to read data if paths are provided -#' @param expression_matrix_class class of expression matrix to use (e.g. 'dgCMatrix', 'DelayedArray') +#' @param instructions list of instructions or output result from +#' \code{\link[GiottoClass]{createGiottoInstructions}} +#' @param cores how many cores or threads to use to read data if paths are +#' provided +#' @param expression_matrix_class class of expression matrix to use +#' (e.g. 'dgCMatrix', 'DelayedArray') #' @param h5_file optional path to create an on-disk h5 file #' @param verbose be verbose #' @@ -232,78 +238,74 @@ NULL #' } #' #' @export -createGiottoVisiumObject = function(visium_dir = NULL, - expr_data = c('raw', 'filter'), - gene_column_index = 1, - h5_visium_path = NULL, - h5_gene_ids = c('symbols', 'ensembl'), - h5_tissue_positions_path = NULL, - h5_image_png_path = NULL, - h5_json_scalefactors_path = NULL, - png_name = NULL, - do_manual_adj = FALSE, # deprecated - xmax_adj = 0, # deprecated - xmin_adj = 0, # deprecated - ymax_adj = 0, # deprecated - ymin_adj = 0, # deprecated - instructions = NULL, - expression_matrix_class = c("dgCMatrix", "DelayedArray"), - h5_file = NULL, - cores = NA, - verbose = NULL) { - - # NSE vars - barcode = row_pxl = col_pxl = in_tissue = array_row = array_col = NULL - - # handle deprecations - img_dep_msg <- "The params 'do_manual_adj', 'xmax_adj', 'xmin_adj', 'ymax_adj, - 'ymin_adj' are no longer used. Please use the automated workflow." - if(!isFALSE(do_manual_adj) || - xmax_adj != 0 || - xmin_adj != 0 || - ymax_adj != 0 || - ymin_adj != 0) { - stop(wrap_txt(img_dep_msg)) - } - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - - # get arguments list for object creation - if(!is.null(h5_visium_path)) { - - argslist <- .visium_read_h5( - h5_visium_path = h5_visium_path, # expression matrix file - h5_gene_ids = h5_gene_ids, # symbol or ensembl - h5_tissue_positions_path = h5_tissue_positions_path, - h5_image_png_path = h5_image_png_path, - h5_json_scalefactors_path = h5_json_scalefactors_path, - verbose = verbose - ) +createGiottoVisiumObject <- function(visium_dir = NULL, + expr_data = c("raw", "filter"), + gene_column_index = 1, + h5_visium_path = NULL, + h5_gene_ids = c("symbols", "ensembl"), + h5_tissue_positions_path = NULL, + h5_image_png_path = NULL, + h5_json_scalefactors_path = NULL, + png_name = NULL, + do_manual_adj = FALSE, # deprecated + xmax_adj = 0, # deprecated + xmin_adj = 0, # deprecated + ymax_adj = 0, # deprecated + ymin_adj = 0, # deprecated + instructions = NULL, + expression_matrix_class = c("dgCMatrix", "DelayedArray"), + h5_file = NULL, + cores = NA, + verbose = NULL) { + # NSE vars + barcode <- row_pxl <- col_pxl <- in_tissue <- array_row <- array_col <- NULL + + # handle deprecations + img_dep_msg <- "The params 'do_manual_adj', 'xmax_adj', 'xmin_adj', + 'ymax_adj', 'ymin_adj' are no longer used. + Please use the automated workflow." + if (!isFALSE(do_manual_adj) || + xmax_adj != 0 || + xmin_adj != 0 || + ymax_adj != 0 || + ymin_adj != 0) { + stop(wrap_txt(img_dep_msg)) + } - } else { + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) - argslist <- .visium_read_folder( - visium_dir = visium_dir, - expr_data = expr_data, # type of expression matrix to load - gene_column_index = gene_column_index, # symbol or ensembl - png_name = png_name, - verbose = verbose - ) - } + # get arguments list for object creation + if (!is.null(h5_visium_path)) { + argslist <- .visium_read_h5( + h5_visium_path = h5_visium_path, # expression matrix file + h5_gene_ids = h5_gene_ids, # symbol or ensembl + h5_tissue_positions_path = h5_tissue_positions_path, + h5_image_png_path = h5_image_png_path, + h5_json_scalefactors_path = h5_json_scalefactors_path, + verbose = verbose + ) + } else { + argslist <- .visium_read_folder( + visium_dir = visium_dir, + expr_data = expr_data, # type of expression matrix to load + gene_column_index = gene_column_index, # symbol or ensembl + png_name = png_name, + verbose = verbose + ) + } - # additional args to pass to object creation - argslist$verbose <- verbose - argslist$expression_matrix_class <- expression_matrix_class - argslist$h5_file <- h5_file - argslist$instructions <- instructions + # additional args to pass to object creation + argslist$verbose <- verbose + argslist$expression_matrix_class <- expression_matrix_class + argslist$h5_file <- h5_file + argslist$instructions <- instructions - giotto_object <- do.call(.visium_create, args = argslist) + giotto_object <- do.call(.visium_create, args = argslist) - return(giotto_object) + return(giotto_object) } @@ -314,201 +316,215 @@ createGiottoVisiumObject = function(visium_dir = NULL, .visium_create <- function( - expr_counts_path, - h5_gene_ids = NULL, # h5 - gene_column_index = NULL, # folder - tissue_positions_path, - image_path = NULL, - scale_json_path = NULL, - png_name = NULL, - instructions = NULL, - expression_matrix_class = c("dgCMatrix", "DelayedArray"), - h5_file = NULL, - verbose = NULL -) { - - # NSE vars - barcode <- cell_ID <- row_pxl <- col_pxl <- in_tissue <- array_row <- - array_col <- NULL + expr_counts_path, + h5_gene_ids = NULL, # h5 + gene_column_index = NULL, # folder + tissue_positions_path, + image_path = NULL, + scale_json_path = NULL, + png_name = NULL, + instructions = NULL, + expression_matrix_class = c("dgCMatrix", "DelayedArray"), + h5_file = NULL, + verbose = NULL) { + # NSE vars + barcode <- cell_ID <- row_pxl <- col_pxl <- in_tissue <- array_row <- + array_col <- NULL + + # Assume path checking has been done + + # 1. expression + if (!is.null(h5_gene_ids)) { + expr_results <- get10Xmatrix_h5( + path_to_data = expr_counts_path, + gene_ids = h5_gene_ids + ) + } else { + expr_results <- get10Xmatrix( + path_to_data = expr_counts_path, + gene_column_index = gene_column_index + ) + } - # Assume path checking has been done + # if expr_results is not a list, make it a list compatible with downstream + if (!is.list(expr_results)) expr_results <- list( + "Gene Expression" = expr_results) - # 1. expression - if (!is.null(h5_gene_ids)) { - expr_results <- get10Xmatrix_h5(path_to_data = expr_counts_path, - gene_ids = h5_gene_ids) - } else { - expr_results <- get10Xmatrix(path_to_data = expr_counts_path, - gene_column_index = gene_column_index) - } + # format expected data into list to be used with readExprData() + raw_matrix_list <- list("cell" = list("rna" = list( + "raw" = expr_results[["Gene Expression"]]))) - # if expr_results is not a list, make it a list compatible with downstream - if (!is.list(expr_results)) expr_results = list("Gene Expression" = expr_results) + # add protein expression data to list if it exists + if ("Antibody Capture" %in% names(expr_results)) { + raw_matrix_list$cell$protein$raw <- expr_results[["Antibody Capture"]] + } - # format expected data into list to be used with readExprData() - raw_matrix_list <- list("cell" = list("rna" = list("raw" = expr_results[["Gene Expression"]]))) - # add protein expression data to list if it exists - if ('Antibody Capture' %in% names(expr_results)) { - raw_matrix_list$cell$protein$raw <- expr_results[["Antibody Capture"]] - } + # 2. spatial locations + spatial_results <- data.table::fread(tissue_positions_path) + colnames(spatial_results) <- c("barcode", "in_tissue", "array_row", + "array_col", "col_pxl", "row_pxl") + spatial_results <- spatial_results[match(colnames( + raw_matrix_list$cell[[1]]$raw), barcode)] + data.table::setnames(spatial_results, old = "barcode", new = "cell_ID") + spatial_locs <- spatial_results[, .(cell_ID, row_pxl, -col_pxl)] + # flip x and y + colnames(spatial_locs) <- c("cell_ID", "sdimx", "sdimy") - # 2. spatial locations - spatial_results <- data.table::fread(tissue_positions_path) - colnames(spatial_results) <- c('barcode', 'in_tissue', 'array_row', 'array_col', 'col_pxl', 'row_pxl') - spatial_results <- spatial_results[match(colnames(raw_matrix_list$cell[[1]]$raw), barcode)] - data.table::setnames(spatial_results, old = "barcode", new = "cell_ID") - spatial_locs <- spatial_results[,.(cell_ID, row_pxl,-col_pxl)] # flip x and y - colnames(spatial_locs) <- c("cell_ID", 'sdimx', 'sdimy') + # 3. scalefactors (optional) + json_info <- .visium_read_scalefactors(scale_json_path) - # 3. scalefactors (optional) - json_info <- .visium_read_scalefactors(scale_json_path) + # 4. image (optional) + if (!is.null(image_path)) { + visium_png_list <- .visium_image( + image_path = image_path, + json_info = json_info, + verbose = verbose + ) + } + # 5. metadata + meta_results <- spatial_results[ + , .(cell_ID, in_tissue, array_row, array_col)] + expr_types <- names(raw_matrix_list$cell) + meta_list <- list() + for (etype in expr_types) { + meta_list[[etype]] <- meta_results + } - # 4. image (optional) - if (!is.null(image_path)) { - visium_png_list <- .visium_image( - image_path = image_path, - json_info = json_info, - verbose = verbose - ) - } - - # 5. metadata - meta_results <- spatial_results[,.(cell_ID, in_tissue, array_row, array_col)] - expr_types <- names(raw_matrix_list$cell) - meta_list <- list() - for (etype in expr_types) { - meta_list[[etype]] <- meta_results - } - - - # 6. giotto object - giotto_object <- createGiottoObject( - expression = raw_matrix_list, - spatial_locs = spatial_locs, - instructions = instructions, - cell_metadata = meta_list, - images = visium_png_list - ) - - - # 7. polygon information - if(!is.null(json_info)){ - visium_polygons = .visium_spot_poly( - spatlocs = spatial_locs, - json_scalefactors = json_info - ) - giotto_object = setPolygonInfo( - gobject = giotto_object, - x = visium_polygons, - centroids_to_spatlocs = FALSE, - verbose = FALSE, - initialize = TRUE + # 6. giotto object + giotto_object <- createGiottoObject( + expression = raw_matrix_list, + spatial_locs = spatial_locs, + instructions = instructions, + cell_metadata = meta_list, + images = visium_png_list ) - } - return(giotto_object) + + # 7. polygon information + if (!is.null(json_info)) { + visium_polygons <- .visium_spot_poly( + spatlocs = spatial_locs, + json_scalefactors = json_info + ) + giotto_object <- setPolygonInfo( + gobject = giotto_object, + x = visium_polygons, + centroids_to_spatlocs = FALSE, + verbose = FALSE, + initialize = TRUE + ) + } + + return(giotto_object) } # Find and check the filepaths within a structured visium directory .visium_read_folder <- function( - visium_dir = NULL, - expr_data = c("raw", "filter"), - gene_column_index = 1, - png_name = NULL, - verbose = NULL -) -{ - - vmsg(.v = verbose, "A structured visium directory will be used") - - ## check arguments - if(is.null(visium_dir)) .gstop('visium_dir needs to be a path to a visium directory \n') - visium_dir = path.expand(visium_dir) - if(!dir.exists(visium_dir)) .gstop(visium_dir, ' does not exist!') - expr_data = match.arg(expr_data, choices = c('raw', 'filter')) - - - ## 1. check expression - expr_counts_path <- switch( - expr_data, - "raw" = paste0(visium_dir, '/', 'raw_feature_bc_matrix/'), - "filter" = paste0(visium_dir, '/', 'filtered_feature_bc_matrix/') - ) - if (!file.exists(expr_counts_path)) .gstop(expr_counts_path, "does not exist!") + visium_dir = NULL, + expr_data = c("raw", "filter"), + gene_column_index = 1, + png_name = NULL, + verbose = NULL) { + vmsg(.v = verbose, "A structured visium directory will be used") + + ## check arguments + if (is.null(visium_dir)) + .gstop("visium_dir needs to be a path to a visium directory") + visium_dir <- path.expand(visium_dir) + if (!dir.exists(visium_dir)) .gstop(visium_dir, " does not exist!") + expr_data <- match.arg(expr_data, choices = c("raw", "filter")) + + + ## 1. check expression + expr_counts_path <- switch(expr_data, + "raw" = paste0(visium_dir, "/", "raw_feature_bc_matrix/"), + "filter" = paste0(visium_dir, "/", "filtered_feature_bc_matrix/") + ) + if (!file.exists(expr_counts_path)) + .gstop(expr_counts_path, "does not exist!") - ## 2. check spatial locations - spatial_dir = paste0(visium_dir, '/', 'spatial/') - tissue_positions_path = Sys.glob(paths = file.path(spatial_dir, 'tissue_positions*')) + ## 2. check spatial locations + spatial_dir <- paste0(visium_dir, "/", "spatial/") + tissue_positions_path <- Sys.glob( + paths = file.path(spatial_dir, "tissue_positions*")) - ## 3. check spatial image - if(is.null(png_name)) { - png_list = list.files(spatial_dir, pattern = "*.png") - png_name = png_list[1] - } - png_path = paste0(spatial_dir,'/',png_name) - if(!file.exists(png_path)) .gstop(png_path, ' does not exist!') + ## 3. check spatial image + if (is.null(png_name)) { + png_list <- list.files(spatial_dir, pattern = "*.png") + png_name <- png_list[1] + } + png_path <- paste0(spatial_dir, "/", png_name) + if (!file.exists(png_path)) .gstop(png_path, " does not exist!") - ## 4. check scalefactors - scalefactors_path = paste0(spatial_dir,'/','scalefactors_json.json') - if (!file.exists(scalefactors_path)) .gstop(scalefactors_path, "does not exist!") + ## 4. check scalefactors + scalefactors_path <- paste0(spatial_dir, "/", "scalefactors_json.json") + if (!file.exists(scalefactors_path)) + .gstop(scalefactors_path, "does not exist!") - list( - expr_counts_path = expr_counts_path, - gene_column_index = gene_column_index, - tissue_positions_path = tissue_positions_path, - image_path = png_path, - scale_json_path = scalefactors_path - ) + list( + expr_counts_path = expr_counts_path, + gene_column_index = gene_column_index, + tissue_positions_path = tissue_positions_path, + image_path = png_path, + scale_json_path = scalefactors_path + ) } .visium_read_h5 <- function( - h5_visium_path = h5_visium_path, # expression matrix - h5_gene_ids = h5_gene_ids, - h5_tissue_positions_path = h5_tissue_positions_path, - h5_image_png_path = h5_image_png_path, - h5_json_scalefactors_path = h5_json_scalefactors_path, - verbose = NULL -) { - - # 1. filepaths - vmsg(.v = verbose, "A path to an .h5 10X file was provided and will be used \n") - if (!file.exists(h5_visium_path)) .gstop("The provided path ", h5_visium_path, " does not exist \n") - if (is.null(h5_tissue_positions_path)) .gstop("A path to the tissue positions (.csv) needs to be provided to h5_tissue_positions_path \n") - if (!file.exists(h5_tissue_positions_path)) .gstop("The provided path ", h5_tissue_positions_path, " does not exist \n") - if (!is.null(h5_image_png_path)) { - if (!file.exists(h5_image_png_path)) - .gstop("The provided h5 image path ", h5_image_png_path, "does not exist. - Set to NULL to exclude or provide the correct path.\n") - } - if (!is.null(h5_json_scalefactors_path)) { - if (!file.exists(h5_json_scalefactors_path)) { - warning(wrap_txt( - "No file found at h5_json_scalefactors_path. - Scalefactors are needed for proper image alignment and polygon generation" - )) + h5_visium_path = h5_visium_path, # expression matrix + h5_gene_ids = h5_gene_ids, + h5_tissue_positions_path = h5_tissue_positions_path, + h5_image_png_path = h5_image_png_path, + h5_json_scalefactors_path = h5_json_scalefactors_path, + verbose = NULL) { + # 1. filepaths + vmsg(.v = verbose, + "A path to an .h5 10X file was provided and will be used") + if (!file.exists(h5_visium_path)) + .gstop("The provided path ", h5_visium_path, " does not exist") + if (is.null(h5_tissue_positions_path)) + .gstop("A path to the tissue positions (.csv) needs to be provided to + h5_tissue_positions_path") + if (!file.exists(h5_tissue_positions_path)) + .gstop("The provided path ", h5_tissue_positions_path, + " does not exist") + if (!is.null(h5_image_png_path)) { + if (!file.exists(h5_image_png_path)) { + .gstop("The provided h5 image path ", h5_image_png_path, + "does not exist. + Set to NULL to exclude or provide the correct path.") + } } - } - - list( - expr_counts_path = h5_visium_path, - h5_gene_ids = h5_gene_ids, - tissue_positions_path = h5_tissue_positions_path, - image_path = h5_image_png_path, - scale_json_path = h5_json_scalefactors_path - ) + if (!is.null(h5_json_scalefactors_path)) { + if (!file.exists(h5_json_scalefactors_path)) { + warning(wrap_txt( + "No file found at h5_json_scalefactors_path. + Scalefactors are needed for proper image alignment and + polygon generation" + )) + } + } + + list( + expr_counts_path = h5_visium_path, + h5_gene_ids = h5_gene_ids, + tissue_positions_path = h5_tissue_positions_path, + image_path = h5_image_png_path, + scale_json_path = h5_json_scalefactors_path + ) } @@ -533,29 +549,29 @@ createGiottoVisiumObject = function(visium_dir = NULL, #' for the "cell" spatial unit. #' @export addVisiumPolygons <- function(gobject, - scalefactor_path = NULL){ - assert_giotto(gobject) + scalefactor_path = NULL) { + assert_giotto(gobject) - visium_spat_locs = getSpatialLocations( - gobject = gobject, - spat_unit = "cell" - ) + visium_spat_locs <- getSpatialLocations( + gobject = gobject, + spat_unit = "cell" + ) - scalefactors_list = .visium_read_scalefactors( - json_path = scalefactor_path - ) + scalefactors_list <- .visium_read_scalefactors( + json_path = scalefactor_path + ) - visium_polygons = .visium_spot_poly( - spatlocs = visium_spat_locs, - json_scalefactors = scalefactors_list - ) + visium_polygons <- .visium_spot_poly( + spatlocs = visium_spat_locs, + json_scalefactors = scalefactors_list + ) - gobject = addGiottoPolygons( - gobject = gobject, - gpolygons = list(visium_polygons) - ) + gobject <- addGiottoPolygons( + gobject = gobject, + gpolygons = list(visium_polygons) + ) - return(gobject) + return(gobject) } @@ -571,58 +587,60 @@ addVisiumPolygons <- function(gobject, #' containing scalefactors for Visium data in the expected format. #' Returns NULL if no path is provided or if the file does not exist. #' @keywords internal -.visium_read_scalefactors = function(json_path = NULL) { +.visium_read_scalefactors <- function(json_path = NULL) { + if (!checkmate::test_file_exists(json_path)) { + if (!is.null(json_path)) { + warning("scalefactors not discovered at: \n", + json_path, call. = FALSE) + } + return(NULL) + } - if (!checkmate::test_file_exists(json_path)) { - if (!is.null(json_path)) { - warning('scalefactors not discovered at: \n', json_path, call. = FALSE) + json_scalefactors <- jsonlite::read_json(json_path) + + # Intial assertion that json dimensions are appropriate + checkmate::assert_list( + x = json_scalefactors, + types = "numeric", + min.len = 4L, + max.len = 5L + ) + + expected_json_names <- c( + "regist_target_img_scalef", # NEW as of 2023 + "spot_diameter_fullres", + "tissue_hires_scalef", + "fiducial_diameter_fullres", + "tissue_lowres_scalef" + ) + + # Visium assay with chemistry v2 contains an additional + # keyword in the json file + new_format_2023 <- checkmate::test_list( + x = json_scalefactors, + types = "numeric", + len = 5L + ) + + # If the scalefactors are of size 4 (older assay), clip the new keyword + if (!new_format_2023) expected_json_names <- expected_json_names[2:5] + + if (!setequal(names(json_scalefactors), expected_json_names)) { + warning(GiottoUtils::wrap_txt( + "h5 scalefactors json names differ from expected. + [Expected]:", expected_json_names, "\n", + "[Actual]:", names(json_scalefactors) + )) } - return(NULL) - } - - json_scalefactors = jsonlite::read_json(json_path) - - # Intial assertion that json dimensions are appropriate - checkmate::assert_list( - x = json_scalefactors, - types = 'numeric', - min.len = 4L, - max.len = 5L - ) - - expected_json_names = c( - "regist_target_img_scalef", # NEW as of 2023 - "spot_diameter_fullres", - "tissue_hires_scalef", - "fiducial_diameter_fullres", - "tissue_lowres_scalef" - ) - - # Visium assay with chemistry v2 contains an additional - # keyword in the json file - new_format_2023 = checkmate::test_list( - x = json_scalefactors, - types = 'numeric', - len = 5L - ) - - # If the scalefactors are of size 4 (older assay), clip the new keyword - if (!new_format_2023) expected_json_names = expected_json_names[2:5] - - if (!setequal(names(json_scalefactors), expected_json_names)) { - warning(GiottoUtils::wrap_txt( - 'h5 scalefactors json names differ from expected. - [Expected]:', expected_json_names, '\n', - '[Actual]:', names(json_scalefactors))) - } - - return (json_scalefactors) + + return(json_scalefactors) } #' @title Calculate Pixel to Micron Scalefactor #' @name visium_micron_scalefactor -#' @param json_scalefactors list of scalefactors from .visium_read_scalefactors() +#' @param json_scalefactors list of scalefactors from +#' .visium_read_scalefactors() #' @return scale factor for converting pixel to micron #' @details #' Calculates pixel to micron scalefactor. @@ -632,10 +650,10 @@ addVisiumPolygons <- function(gobject, #' of spatial information. #' @keywords internal .visium_micron_scale <- function(json_scalefactors) { - # visium spots diameter : 55 micron - # diameter of a spot at this spatial scaling : scalefactor_list$spot_diameter_fullres - px_to_micron <- 55 / json_scalefactors$spot_diameter_fullres - return (px_to_micron) + # visium spots diameter : 55 micron + # diameter of a spot at this spatial scaling : scalefactor_list$spot_diameter_fullres + px_to_micron <- 55 / json_scalefactors$spot_diameter_fullres + return(px_to_micron) } @@ -643,7 +661,8 @@ addVisiumPolygons <- function(gobject, #' @name .visium_spot_poly #' @param spatlocs spatial locations data.table or `spatLocsObj` containing #' centroid locations of visium spots -#' @param json_scalefactors list of scalefactors from .visium_read_scalefactors() +#' @param json_scalefactors list of scalefactors from +#' .visium_read_scalefactors() #' @return giottoPolygon object #' @details #' Creates circular polygons for spatial representation of @@ -651,24 +670,24 @@ addVisiumPolygons <- function(gobject, #' @keywords internal #' @md .visium_spot_poly <- function(spatlocs = NULL, - json_scalefactors) { - - if(inherits(spatlocs, "spatLocsObj")){ - spatlocs <- spatlocs[] - } - - vis_spot_poly <- GiottoClass::circleVertices( - radius = json_scalefactors$spot_diameter_fullres/2 - ) + json_scalefactors) { + if (inherits(spatlocs, "spatLocsObj")) { + spatlocs <- spatlocs[] + } - GiottoClass::polyStamp( - stamp_dt = vis_spot_poly, - spatlocs = spatlocs, - verbose = FALSE - ) %>% - createGiottoPolygonsFromDfr(calc_centroids = TRUE, - verbose = FALSE) + vis_spot_poly <- GiottoClass::circleVertices( + radius = json_scalefactors$spot_diameter_fullres / 2 + ) + GiottoClass::polyStamp( + stamp_dt = vis_spot_poly, + spatlocs = spatlocs, + verbose = FALSE + ) %>% + createGiottoPolygonsFromDfr( + calc_centroids = TRUE, + verbose = FALSE + ) } @@ -680,72 +699,70 @@ addVisiumPolygons <- function(gobject, # image_path should be expected to be full filepath # should only be used when do_manual_adj (deprecated) is FALSE .visium_image <- function( - image_path, - json_info = NULL, - micron_scale = FALSE, - verbose = NULL -) { - - # assume image already checked - vmsg(.v = verbose, .initial = " - ", "found image") - - # 1. determine image scalefactor to use ----------------------------------- # - if (!is.null(json_info)) checkmate::assert_list(json_info) - png_name <- basename(image_path) # used for name pattern matching only - - if (is.null(json_info)) { # if none provided - warning(wrap_txt( - 'No scalefactors json info provided. - Visium image scale_factor defaulting to 1' - )) - scale_factor = 1 - - } else { # if provided - - scale_factor <- NULL # initial value - - # determine type of visium image - visium_img_type <- NULL - possible_types <- c("lowres", "hires") - for (img_type in possible_types) { - if (grepl(img_type, png_name)) visium_img_type <- img_type - } + image_path, + json_info = NULL, + micron_scale = FALSE, + verbose = NULL) { + # assume image already checked + vmsg(.v = verbose, .initial = " - ", "found image") + + # 1. determine image scalefactor to use ---------------------------------- # + if (!is.null(json_info)) checkmate::assert_list(json_info) + png_name <- basename(image_path) # used for name pattern matching only + + if (is.null(json_info)) { # if none provided + warning(wrap_txt( + "No scalefactors json info provided. + Visium image scale_factor defaulting to 1" + )) + scale_factor <- 1 + } else { # if provided + + scale_factor <- NULL # initial value + + # determine type of visium image + visium_img_type <- NULL + possible_types <- c("lowres", "hires") + for (img_type in possible_types) { + if (grepl(img_type, png_name)) visium_img_type <- img_type + } - if (is.null(visium_img_type)) { # if not recognized visium image type - .gstop( - "\'image_path\' filename did not partial match either \'lowres\' or \'hires\'. - Ensure specified image is either the Visium lowres or hires image and rename it accordingly" - ) - } + if (is.null(visium_img_type)) { # if not recognized visium image type + .gstop( + "\'image_path\' filename did not partial match either + \'lowres\' or \'hires\'. Ensure specified image is either the + Visium lowres or hires image and rename it accordingly" + ) + } - vmsg(.v = verbose, .initial = " - ", - "found scalefactors. - attempting automatic alignment for the", - str_quote(visium_img_type), "image\n\n") + vmsg( + .v = verbose, .initial = " - ", + "found scalefactors. attempting automatic alignment for the", + str_quote(visium_img_type), "image\n\n" + ) - scale_factor <- switch( - visium_img_type, - "lowres" = json_info[["tissue_lowres_scalef"]], - "hires" = json_info[["tissue_hires_scalef"]] - ) - } + scale_factor <- switch(visium_img_type, + "lowres" = json_info[["tissue_lowres_scalef"]], + "hires" = json_info[["tissue_hires_scalef"]] + ) + } - if (isTRUE(micron_scale)) { - scale_factor <- scale_factor * .visium_micron_scale(json_info) - } + if (isTRUE(micron_scale)) { + scale_factor <- scale_factor * .visium_micron_scale(json_info) + } - # 2. create image --------------------------------------------------------- # - visium_img <- createGiottoLargeImage( - raster_object = image_path, - name = "image", - negative_y = TRUE, - scale_factor = (1/scale_factor) - ) + # 2. create image -------------------------------------------------------- # + visium_img <- createGiottoLargeImage( + raster_object = image_path, + name = "image", + negative_y = TRUE, + scale_factor = (1 / scale_factor) + ) - visium_img_list = list(visium_img) - names(visium_img_list) = c('image') + visium_img_list <- list(visium_img) + names(visium_img_list) <- c("image") - return(visium_img_list) + return(visium_img_list) } @@ -775,39 +792,38 @@ addVisiumPolygons <- function(gobject, #' if image_file is a list. #' @export createMerscopeLargeImage <- function(image_file, - transforms_file, - name = 'image') { - - checkmate::assert_character(transforms_file) - tfsDT <- data.table::fread(transforms_file) - if (inherits(image_file, "character")) { - image_file <- as.list(image_file) - } - checkmate::assert_list(image_file) - - scalef <- c(1/tfsDT[[1,1]], 1/tfsDT[[2,2]]) - x_shift <- -tfsDT[[1,3]]/tfsDT[[1,1]] - y_shift <- -tfsDT[[2,3]]/tfsDT[[2,2]] - - out <- lapply(seq_along(image_file), function(i) { - gimg <- createGiottoLargeImage( - raster_object = image_file[[i]], - name = name[[i]], - scale_factor = scalef, - negative_y = FALSE - ) + transforms_file, + name = "image") { + checkmate::assert_character(transforms_file) + tfsDT <- data.table::fread(transforms_file) + if (inherits(image_file, "character")) { + image_file <- as.list(image_file) + } + checkmate::assert_list(image_file) + + scalef <- c(1 / tfsDT[[1, 1]], 1 / tfsDT[[2, 2]]) + x_shift <- -tfsDT[[1, 3]] / tfsDT[[1, 1]] + y_shift <- -tfsDT[[2, 3]] / tfsDT[[2, 2]] + + out <- lapply(seq_along(image_file), function(i) { + gimg <- createGiottoLargeImage( + raster_object = image_file[[i]], + name = name[[i]], + scale_factor = scalef, + negative_y = FALSE + ) - gimg <- spatShift(gimg, dx = x_shift, dy = y_shift) + gimg <- spatShift(gimg, dx = x_shift, dy = y_shift) - gimg@extent <- terra::ext(gimg@raster_object) - return(gimg) - }) + gimg@extent <- terra::ext(gimg@raster_object) + return(gimg) + }) - if (length(out) == 1L) { - out <- unlist(out) - } + if (length(out) == 1L) { + out <- unlist(out) + } - return(out) + return(out) } @@ -818,12 +834,13 @@ createMerscopeLargeImage <- function(image_file, #' @title Create Vizgen MERSCOPE Giotto Object #' @name createGiottoMerscopeObject -#' @description Given the path to a MERSCOPE experiment directory, creates a Giotto -#' object. +#' @description Given the path to a MERSCOPE experiment directory, creates a +#' Giotto object. #' @param merscope_dir full path to the exported merscope directory -#' @param data_to_use which of either the 'subcellular' or 'aggregate' information -#' to use for object creation -#' @param FOVs which FOVs to use when building the subcellular object. (default is NULL) +#' @param data_to_use which of either the 'subcellular' or 'aggregate' +#' information to use for object creation +#' @param FOVs which FOVs to use when building the subcellular object. +#' (default is NULL) #' NULL loads all FOVs (very slow) #' @param calculate_overlap whether to run \code{\link{calculateOverlapRaster}} #' @param overlap_to_matrix whether to run \code{\link{overlapToMatrix}} @@ -833,9 +850,10 @@ createMerscopeLargeImage <- function(image_file, #' @return a giotto object #' @export #' @details -#' [\strong{Expected Directory}] This function generates a giotto object when given a -#' link to a MERSCOPE output directory. It expects the following items within the directory -#' where the \strong{bolded} portions are what this function matches against: +#' [\strong{Expected Directory}] This function generates a giotto object when +#' given a link to a MERSCOPE output directory. It expects the following items +#' within the directory where the \strong{bolded} portions are what this +#' function matches against: #' \itemize{ #' \item{\strong{cell_boundaries} (folder .hdf5 files)} #' \item{\strong{images} (folder of .tif images and a scalefactor/transfrom table)} @@ -843,158 +861,170 @@ createMerscopeLargeImage <- function(image_file, #' \item{cell_metadata\strong{fov_positions_file}.csv (file)} #' \item{detected_transcripts\strong{metadata_file}.csv (file)} #' } -createGiottoMerscopeObject = function(merscope_dir, - data_to_use = c('subcellular', 'aggregate'), - FOVs = NULL, - poly_z_indices = 1:7, - calculate_overlap = TRUE, - overlap_to_matrix = TRUE, - aggregate_stack = TRUE, - aggregate_stack_param = list(summarize_expression = 'sum', - summarize_locations = 'mean', - new_spat_unit = 'cell'), - instructions = NULL, - cores = NA, - verbose = TRUE) { - - fovs = NULL - - # 0. setup - merscope_dir = path.expand(merscope_dir) - - poly_z_indices = as.integer(poly_z_indices) - if(any(poly_z_indices < 1)) stop(wrap_txt( - 'poly_z_indices is a vector of one or more integers starting from 1.', - errWidth = TRUE - )) - - # determine data to use - data_to_use = match.arg(arg = data_to_use, choices = c('subcellular','aggregate')) - - # 1. test if folder structure exists and is as expected - dir_items = .read_merscope_folder(merscope_dir = merscope_dir, - data_to_use = data_to_use, - cores = cores, - verbose = verbose) - - # 2. load in directory items - data_list = .load_merscope_folder(dir_items = dir_items, - data_to_use = data_to_use, - poly_z_indices = poly_z_indices, - fovs = fovs, - cores = cores, - verbose = verbose) - - # 3. Create giotto object - if(data_to_use == 'subcellular') { - - merscope_gobject = .createGiottoMerscopeObject_subcellular(data_list = data_list, - calculate_overlap = calculate_overlap, - overlap_to_matrix = overlap_to_matrix, - aggregate_stack = aggregate_stack, - aggregate_stack_param = aggregate_stack_param, - cores = cores, - verbose = verbose) - - } else if(data_to_use == 'aggregate') { - - merscope_gobject = .createGiottoMerscopeObject_aggregate(data_list = data_list, - cores = cores, - verbose = verbose) - - } else { - stop(wrap_txt('data_to_use "', data_to_use, '" not implemented', sep = '')) - } - - return(merscope_gobject) +createGiottoMerscopeObject <- function(merscope_dir, + data_to_use = c("subcellular", "aggregate"), + FOVs = NULL, + poly_z_indices = 1:7, + calculate_overlap = TRUE, + overlap_to_matrix = TRUE, + aggregate_stack = TRUE, + aggregate_stack_param = list( + summarize_expression = "sum", + summarize_locations = "mean", + new_spat_unit = "cell" + ), + instructions = NULL, + cores = NA, + verbose = TRUE) { + fovs <- NULL + + # 0. setup + merscope_dir <- path.expand(merscope_dir) + + poly_z_indices <- as.integer(poly_z_indices) + if (any(poly_z_indices < 1)) { + stop(wrap_txt( + "poly_z_indices is a vector of one or more integers starting from 1.", + errWidth = TRUE + )) + } + + # determine data to use + data_to_use <- match.arg( + arg = data_to_use, choices = c("subcellular", "aggregate")) + + # 1. test if folder structure exists and is as expected + dir_items <- .read_merscope_folder( + merscope_dir = merscope_dir, + data_to_use = data_to_use, + cores = cores, + verbose = verbose + ) + + # 2. load in directory items + data_list <- .load_merscope_folder( + dir_items = dir_items, + data_to_use = data_to_use, + poly_z_indices = poly_z_indices, + fovs = fovs, + cores = cores, + verbose = verbose + ) + + # 3. Create giotto object + if (data_to_use == "subcellular") { + merscope_gobject <- .createGiottoMerscopeObject_subcellular( + data_list = data_list, + calculate_overlap = calculate_overlap, + overlap_to_matrix = overlap_to_matrix, + aggregate_stack = aggregate_stack, + aggregate_stack_param = aggregate_stack_param, + cores = cores, + verbose = verbose + ) + } else if (data_to_use == "aggregate") { + merscope_gobject <- .createGiottoMerscopeObject_aggregate( + data_list = data_list, + cores = cores, + verbose = verbose + ) + } else { + stop(wrap_txt('data_to_use "', data_to_use, + '" not implemented', sep = "")) + } + return(merscope_gobject) } -#' @describeIn createGiottoMerscopeObject Create giotto object with 'subcellular' workflow +#' @describeIn createGiottoMerscopeObject Create giotto object with +#' 'subcellular' workflow #' @param data_list list of loaded data from \code{\link{load_merscope_folder}} #' @keywords internal -.createGiottoMerscopeObject_subcellular = function(data_list, - calculate_overlap = TRUE, - overlap_to_matrix = TRUE, - aggregate_stack = TRUE, - aggregate_stack_param = list(summarize_expression = 'sum', - summarize_locations = 'mean', - new_spat_unit = 'cell'), - cores = NA, - verbose = TRUE) { - - feat_coord = neg_coord = cellLabel_dir = instructions = NULL - - # unpack data_list - poly_info = data_list$poly_info - tx_dt = data_list$tx_dt - micronToPixelScale = data_list$micronToPixelScale - image_list = data_list$images - - # data.table vars - gene = NULL - - # split tx_dt by expression and blank - vmsg('Splitting detections by feature vs blank', .v = verbose) - feat_id_all = tx_dt[, unique(gene)] - blank_id = feat_id_all[grepl(pattern = 'Blank', feat_id_all)] - feat_id = feat_id_all[!feat_id_all %in% blank_id] - - feat_dt = tx_dt[gene %in% feat_id,] - blank_dt = tx_dt[gene %in% blank_id,] - - # extract transcript_id col and store as feature meta - feat_meta = unique(feat_dt[, c('gene', 'transcript_id', 'barcode_id'), with = FALSE]) - blank_meta = unique(blank_dt[, c('gene', 'transcript_id', 'barcode_id'), with = FALSE]) - feat_dt[, c('transcript_id', 'barcode_id') := NULL] - blank_dt[, c('transcript_id', 'barcode_id') := NULL] - - if(isTRUE(verbose)) { - message(' > Features: ', feat_dt[, .N]) - message(' > Blanks: ', blank_dt[, .N]) - } - - # build giotto object - vmsg('Building subcellular giotto object...', .v = verbose) - z_sub = createGiottoObjectSubcellular( - gpoints = list('rna' = feat_coord, - 'neg_probe' = neg_coord), - gpolygons = list('cell' = cellLabel_dir), - polygon_mask_list_params = list( - mask_method = 'guess', - flip_vertical = TRUE, - flip_horizontal = FALSE, - shift_horizontal_step = FALSE +.createGiottoMerscopeObject_subcellular <- function(data_list, + calculate_overlap = TRUE, + overlap_to_matrix = TRUE, + aggregate_stack = TRUE, + aggregate_stack_param = list( + summarize_expression = "sum", + summarize_locations = "mean", + new_spat_unit = "cell" ), - instructions = instructions, - cores = cores - ) + cores = NA, + verbose = TRUE) { + feat_coord <- neg_coord <- cellLabel_dir <- instructions <- NULL + + # unpack data_list + poly_info <- data_list$poly_info + tx_dt <- data_list$tx_dt + micronToPixelScale <- data_list$micronToPixelScale + image_list <- data_list$images + + # data.table vars + gene <- NULL + + # split tx_dt by expression and blank + vmsg("Splitting detections by feature vs blank", .v = verbose) + feat_id_all <- tx_dt[, unique(gene)] + blank_id <- feat_id_all[grepl(pattern = "Blank", feat_id_all)] + feat_id <- feat_id_all[!feat_id_all %in% blank_id] + + feat_dt <- tx_dt[gene %in% feat_id, ] + blank_dt <- tx_dt[gene %in% blank_id, ] + + # extract transcript_id col and store as feature meta + feat_meta <- unique(feat_dt[, c("gene", "transcript_id", "barcode_id"), + with = FALSE]) + blank_meta <- unique(blank_dt[, c("gene", "transcript_id", "barcode_id"), + with = FALSE]) + feat_dt[, c("transcript_id", "barcode_id") := NULL] + blank_dt[, c("transcript_id", "barcode_id") := NULL] + + if (isTRUE(verbose)) { + message(" > Features: ", feat_dt[, .N]) + message(" > Blanks: ", blank_dt[, .N]) + } + # build giotto object + vmsg("Building subcellular giotto object...", .v = verbose) + z_sub <- createGiottoObjectSubcellular( + gpoints = list( + "rna" = feat_coord, + "neg_probe" = neg_coord + ), + gpolygons = list("cell" = cellLabel_dir), + polygon_mask_list_params = list( + mask_method = "guess", + flip_vertical = TRUE, + flip_horizontal = FALSE, + shift_horizontal_step = FALSE + ), + instructions = instructions, + cores = cores + ) } -#' @describeIn createGiottoMerscopeObject Create giotto object with 'aggregate' workflow +#' @describeIn createGiottoMerscopeObject Create giotto object with 'aggregate' +#' workflow #' @param data_list list of loaded data from \code{\link{load_merscope_folder}} #' @keywords internal -.createGiottoMerscopeObject_aggregate = function(data_list, - cores = NA, - verbose = TRUE) { - - # unpack data_list - micronToPixelScale = data_list$micronToPixelScale - expr_dt = data_list$expr_dt - cell_meta = data_list$expr_mat - image_list = data_list$images - - # split expr_dt by expression and blank - - # feat_id_all = - +.createGiottoMerscopeObject_aggregate <- function(data_list, + cores = NA, + verbose = TRUE) { + # unpack data_list + micronToPixelScale <- data_list$micronToPixelScale + expr_dt <- data_list$expr_dt + cell_meta <- data_list$expr_mat + image_list <- data_list$images + + # split expr_dt by expression and blank + + # feat_id_all = } @@ -1005,38 +1035,39 @@ createGiottoMerscopeObject = function(merscope_dir, #' @title Create Spatial Genomics Giotto Object #' @name createSpatialGenomicsObject #' @param sg_dir full path to the exported Spatial Genomics directory -#' @param instructions new instructions (e.g. result from createGiottoInstructions) +#' @param instructions new instructions +#' (e.g. result from createGiottoInstructions) #' @description Given the path to a Spatial Genomics data directory, creates a #' Giotto object. #' @export createSpatialGenomicsObject <- function(sg_dir = NULL, - instructions = NULL) { - # Find files in Spatial Genomics directory - dapi = list.files(sg_dir, full.names = TRUE, pattern = 'DAPI') - mask = list.files(sg_dir, full.names = TRUE, pattern = 'mask') - tx = list.files(sg_dir, full.names = TRUE, pattern = 'transcript') - # Create Polygons - gpoly = createGiottoPolygonsFromMask( - mask, - shift_vertical_step = FALSE, - shift_horizontal_step = FALSE, - flip_horizontal = FALSE, - flip_vertical = FALSE - ) - # Create Points - tx = data.table::fread(tx) - gpoints = createGiottoPoints(tx) - dim(tx) - # Create object and add image - gimg = createGiottoLargeImage(dapi, use_rast_ext = TRUE) - sg = createGiottoObjectSubcellular( - gpoints = list('rna' = gpoints), - gpolygons = list('cell' = gpoly), - instructions = instructions - ) - sg = addGiottoImage(sg, images = list(image = gimg)) - # Return SG object - return(sg) + instructions = NULL) { + # Find files in Spatial Genomics directory + dapi <- list.files(sg_dir, full.names = TRUE, pattern = "DAPI") + mask <- list.files(sg_dir, full.names = TRUE, pattern = "mask") + tx <- list.files(sg_dir, full.names = TRUE, pattern = "transcript") + # Create Polygons + gpoly <- createGiottoPolygonsFromMask( + mask, + shift_vertical_step = FALSE, + shift_horizontal_step = FALSE, + flip_horizontal = FALSE, + flip_vertical = FALSE + ) + # Create Points + tx <- data.table::fread(tx) + gpoints <- createGiottoPoints(tx) + dim(tx) + # Create object and add image + gimg <- createGiottoLargeImage(dapi, use_rast_ext = TRUE) + sg <- createGiottoObjectSubcellular( + gpoints = list("rna" = gpoints), + gpolygons = list("cell" = gpoly), + instructions = instructions + ) + sg <- addGiottoImage(sg, images = list(image = gimg)) + # Return SG object + return(sg) } @@ -1051,19 +1082,22 @@ createSpatialGenomicsObject <- function(sg_dir = NULL, #' object. #' @param cosmx_dir full path to the exported cosmx directory #' @param data_to_use which type(s) of expression data to build the gobject with -#' Default is \code{'all'} information available. \code{'subcellular'} loads the transcript -#' coordinates only. \code{'aggregate'} loads the provided aggregated expression matrix. +#' Default is \code{'all'} information available. \code{'subcellular'} loads +#' the transcript coordinates only. \code{'aggregate'} loads the provided +#' aggregated expression matrix. #' @param FOVs field of views to load (only affects subcellular data and images) -#' @param remove_background_polygon try to remove background polygon (default: FALSE) +#' @param remove_background_polygon try to remove background polygon +#' (default: FALSE) #' @param background_algo algorithm to remove background polygon #' @param remove_unvalid_polygons remove unvalid polygons (default: TRUE) #' @inheritParams GiottoClass::createGiottoObjectSubcellular #' @return a giotto object #' @export #' @details -#' [\strong{Expected Directory}] This function generates a giotto object when given a -#' link to a cosmx output directory. It expects the following items within the directory -#' where the \strong{bolded} portions are what this function matches against: +#' [\strong{Expected Directory}] This function generates a giotto object when +#' given a link to a cosmx output directory. It expects the following items +#' within the directory where the \strong{bolded} portions are what this +#' function matches against: #' \itemize{ #' \item{\strong{CellComposite} (folder of images)} #' \item{\strong{CellLabels} (folder of images)} @@ -1077,95 +1111,104 @@ createSpatialGenomicsObject <- function(sg_dir = NULL, #' #' [\strong{Workflows}] Workflow to use is accessed through the data_to_use param #' \itemize{ -#' \item{'all' - loads and requires subcellular information from tx_file and fov_positions_file -#' and also the existing aggregated information (expression, spatial locations, and metadata) +#' \item{'all' - loads and requires subcellular information from tx_file and +#' fov_positions_file +#' and also the existing aggregated information +#' (expression, spatial locations, and metadata) #' from exprMat_file and metadata_file.} -#' \item{'subcellular' - loads and requires subcellular information from tx_file and +#' \item{'subcellular' - loads and requires subcellular information from +#' tx_file and #' fov_positions_file only.} -#' \item{'aggregate' - loads and requires the existing aggregate information (expression, -#' spatial locations, and metadata) from exprMat_file and metadata_file.} +#' \item{'aggregate' - loads and requires the existing aggregate information +#' (expression, spatial locations, and metadata) from exprMat_file and +#' metadata_file.} #' } #' -#' [\strong{Images}] Images in the default CellComposite, CellLabels, CompartmentLabels, and CellOverlay -#' folders will be loaded as giotto largeImage objects in all workflows as long as they are available. -#' Additionally, CellComposite images will be converted to giotto image objects, making plotting with +#' [\strong{Images}] Images in the default CellComposite, CellLabels, +#' CompartmentLabels, and CellOverlay +#' folders will be loaded as giotto largeImage objects in all workflows as +#' long as they are available. Additionally, CellComposite images will be +#' converted to giotto image objects, making plotting with #' these image objects more responsive when accessing them from a server. #' \code{\link{showGiottoImageNames}} can be used to see the available images. #' #' -createGiottoCosMxObject = function(cosmx_dir = NULL, - data_to_use = c('all','subcellular','aggregate'), - remove_background_polygon = TRUE, - background_algo = c('range'), - remove_unvalid_polygons = TRUE, - FOVs = NULL, - instructions = NULL, - cores = determine_cores(), - verbose = TRUE) { - - # 0. setup - cosmx_dir = path.expand(cosmx_dir) - - # determine data to use - data_to_use = match.arg(arg = data_to_use, choices = c('all','subcellular','aggregate')) - if(data_to_use %in% c('all', 'aggregate')) { - stop(wrap_txt('Convenience workflows "all" and "aggregate" are not available yet')) - } - - # Define for data.table - fov = target = x_local_px = y_local_px = z = cell_ID = CenterX_global_px = CenterY_global_px = - CenterX_local_px = CenterY_local_px = NULL - - - # 1. test if folder structure exists and is as expected - dir_items = .read_cosmx_folder(cosmx_dir = cosmx_dir, - verbose = verbose) - - - # 2. load and create giotto object - cosmx_gobject <- switch(data_to_use, - "subcellular" = .createGiottoCosMxObject_subcellular( - dir_items, - FOVs = FOVs, - remove_background_polygon = remove_background_polygon, - background_algo = background_algo, - remove_unvalid_polygons = remove_unvalid_polygons, - cores = cores, - verbose = verbose, - instructions = instructions - ), - "aggregate" = .createGiottoCosMxObject_aggregate( - dir_items, - cores = cores, - verbose = verbose, - instructions = instructions - ), - "all" = .createGiottoCosMxObject_all( - dir_items, - FOVs = FOVs, - remove_background_polygon = remove_background_polygon, - background_algo = background_algo, - remove_unvalid_polygons = remove_unvalid_polygons, - cores = cores, - verbose = verbose, - instructions = instructions +createGiottoCosMxObject <- function(cosmx_dir = NULL, + data_to_use = c("all", "subcellular", "aggregate"), + remove_background_polygon = TRUE, + background_algo = c("range"), + remove_unvalid_polygons = TRUE, + FOVs = NULL, + instructions = NULL, + cores = determine_cores(), + verbose = TRUE) { + # 0. setup + cosmx_dir <- path.expand(cosmx_dir) + + # determine data to use + data_to_use <- match.arg( + arg = data_to_use, choices = c("all", "subcellular", "aggregate")) + if (data_to_use %in% c("all", "aggregate")) { + stop(wrap_txt('Convenience workflows "all" and "aggregate" are not + available yet')) + } + + # Define for data.table + fov <- target <- x_local_px <- y_local_px <- z <- cell_ID <- + CenterX_global_px <- CenterY_global_px <- + CenterX_local_px <- CenterY_local_px <- NULL + + + # 1. test if folder structure exists and is as expected + dir_items <- .read_cosmx_folder( + cosmx_dir = cosmx_dir, + verbose = verbose ) - ) - # load in subcellular information, subcellular FOV objects, then join + # 2. load and create giotto object + cosmx_gobject <- switch(data_to_use, + "subcellular" = .createGiottoCosMxObject_subcellular( + dir_items, + FOVs = FOVs, + remove_background_polygon = remove_background_polygon, + background_algo = background_algo, + remove_unvalid_polygons = remove_unvalid_polygons, + cores = cores, + verbose = verbose, + instructions = instructions + ), + "aggregate" = .createGiottoCosMxObject_aggregate( + dir_items, + cores = cores, + verbose = verbose, + instructions = instructions + ), + "all" = .createGiottoCosMxObject_all( + dir_items, + FOVs = FOVs, + remove_background_polygon = remove_background_polygon, + background_algo = background_algo, + remove_unvalid_polygons = remove_unvalid_polygons, + cores = cores, + verbose = verbose, + instructions = instructions + ) + ) + + # load in subcellular information, subcellular FOV objects, then join - # load in pre-generated aggregated expression matrix - if(data_to_use == 'aggregate' | data_to_use == 'all') { - } + # load in pre-generated aggregated expression matrix + if (data_to_use == "aggregate" | data_to_use == "all") { + } - vmsg('done') - return(cosmx_gobject) + vmsg("done") + return(cosmx_gobject) } @@ -1175,210 +1218,216 @@ createGiottoCosMxObject = function(cosmx_dir = NULL, #' @inheritParams createGiottoCosMxObject #' @keywords internal .createGiottoCosMxObject_subcellular <- function( - dir_items, - FOVs = NULL, - remove_background_polygon = TRUE, - background_algo = c('range'), - remove_unvalid_polygons = TRUE, - cores, - verbose = TRUE, - instructions = NULL -) { - - target <- fov <- NULL - - # load tx detections and FOV offsets ------------------------------------- # - data_list = .load_cosmx_folder_subcellular( - dir_items = dir_items, - FOVs = FOVs, - cores = cores, - verbose = verbose - ) - - # unpack data_list - FOV_ID = data_list$FOV_ID - fov_offset_file = data_list$fov_offset_file - tx_coord_all = data_list$tx_coord_all - - # remove global xy values and cell_ID - tx_coord_all[, c('x_global_px', 'y_global_px', 'cell_ID') := NULL] - - data.table::setcolorder(tx_coord_all, c('target', 'x_local_px', 'y_local_px', 'z', 'fov')) - - # feature detection type splitting --------------------------------------- # - - if(isTRUE(verbose)) wrap_msg('Splitting detections by feature vs neg probe') - all_IDs = tx_coord_all[, unique(target)] - neg_IDs = all_IDs[grepl(pattern = 'NegPrb', all_IDs)] - feat_IDs = all_IDs[!all_IDs %in% neg_IDs] + dir_items, + FOVs = NULL, + remove_background_polygon = TRUE, + background_algo = c("range"), + remove_unvalid_polygons = TRUE, + cores, + verbose = TRUE, + instructions = NULL) { + target <- fov <- NULL + + # load tx detections and FOV offsets ------------------------------------- # + data_list <- .load_cosmx_folder_subcellular( + dir_items = dir_items, + FOVs = FOVs, + cores = cores, + verbose = verbose + ) - # split detections DT - feat_coords_all = tx_coord_all[target %in% feat_IDs] - neg_coords_all = tx_coord_all[target %in% neg_IDs] + # unpack data_list + FOV_ID <- data_list$FOV_ID + fov_offset_file <- data_list$fov_offset_file + tx_coord_all <- data_list$tx_coord_all - if(isTRUE(verbose)) { - message(' > Features: ', feat_coords_all[, .N]) - message(' > NegProbes: ', neg_coords_all[, .N]) - } + # remove global xy values and cell_ID + tx_coord_all[, c("x_global_px", "y_global_px", "cell_ID") := NULL] - # FOV-based processing --------------------------------------------------- # + data.table::setcolorder( + tx_coord_all, c("target", "x_local_px", "y_local_px", "z", "fov")) - fov_gobjects_list = lapply(FOV_ID, function(x) { + # feature detection type splitting --------------------------------------- # - # images --------------------------------------------------- # - # build image paths - if(isTRUE(verbose)) message('Loading image information...') + if (isTRUE(verbose)) message("Splitting detections by feature vs neg probe") + all_IDs <- tx_coord_all[, unique(target)] + neg_IDs <- all_IDs[grepl(pattern = "NegPrb", all_IDs)] + feat_IDs <- all_IDs[!all_IDs %in% neg_IDs] - composite_dir = Sys.glob(paths = file.path(dir_items$`CellComposite folder`, paste0('*',x, '*'))) - cellLabel_dir = Sys.glob(paths = file.path(dir_items$`CellLabels folder`, paste0('*',x, '*'))) - compartmentLabel_dir = Sys.glob(paths = file.path(dir_items$`CompartmentLabels folder`, paste0('*',x, '*'))) - cellOverlay_dir = Sys.glob(paths = file.path(dir_items$`CellOverlay folder`, paste0('*',x, '*'))) + # split detections DT + feat_coords_all <- tx_coord_all[target %in% feat_IDs] + neg_coords_all <- tx_coord_all[target %in% neg_IDs] - # Missing warnings - if(length(composite_dir) == 0) { - warning('[ FOV ', x, ' ] No composite images found') - composite_dir = NULL - } - if(length(cellLabel_dir) == 0) { - stop('[ FOV ', x, ' ] No cell mask images found') - } # cell masks are necessary - if(length(compartmentLabel_dir) == 0) { - warning('[ FOV ', x, ' ] No compartment label images found') - compartmentLabel_dir = NULL + if (isTRUE(verbose)) { + message(" > Features: ", feat_coords_all[, .N]) + message(" > NegProbes: ", neg_coords_all[, .N]) } - if(length(cellOverlay_dir) == 0) { - warning('[ FOV ', x, ' ] No cell polygon overlay images found') - cellOverlay_dir = NULL - } - - if(isTRUE(verbose)) message('Image load done') - - if(isTRUE(verbose)) wrap_msg('[ FOV ', x, ']') - - - # transcripts ---------------------------------------------- # - # get FOV specific tx locations - if(isTRUE(verbose)) wrap_msg('Assigning FOV feature detections...') + # FOV-based processing --------------------------------------------------- # + + fov_gobjects_list <- lapply(FOV_ID, function(x) { + # images --------------------------------------------------- # + # build image paths + if (isTRUE(verbose)) message("Loading image information...") + + composite_dir <- Sys.glob(paths = file.path( + dir_items$`CellComposite folder`, paste0("*", x, "*"))) + cellLabel_dir <- Sys.glob(paths = file.path( + dir_items$`CellLabels folder`, paste0("*", x, "*"))) + compartmentLabel_dir <- Sys.glob(paths = file.path( + dir_items$`CompartmentLabels folder`, paste0("*", x, "*"))) + cellOverlay_dir <- Sys.glob(paths = file.path( + dir_items$`CellOverlay folder`, paste0("*", x, "*"))) + + # Missing warnings + if (length(composite_dir) == 0) { + warning("[ FOV ", x, " ] No composite images found") + composite_dir <- NULL + } + if (length(cellLabel_dir) == 0) { + stop("[ FOV ", x, " ] No cell mask images found") + } # cell masks are necessary + if (length(compartmentLabel_dir) == 0) { + warning("[ FOV ", x, " ] No compartment label images found") + compartmentLabel_dir <- NULL + } + if (length(cellOverlay_dir) == 0) { + warning("[ FOV ", x, " ] No cell polygon overlay images found") + cellOverlay_dir <- NULL + } - # feature info - coord_oldnames = c('target', 'x_local_px', 'y_local_px') - coord_newnames = c('feat_ID', 'x', 'y') + if (isTRUE(verbose)) message("Image load done") + + if (isTRUE(verbose)) wrap_msg("[ FOV ", x, "]") + + + # transcripts ---------------------------------------------- # + # get FOV specific tx locations + if (isTRUE(verbose)) message("Assigning FOV feature detections...") + + + # feature info + coord_oldnames <- c("target", "x_local_px", "y_local_px") + coord_newnames <- c("feat_ID", "x", "y") + + feat_coord <- feat_coords_all[fov == as.numeric(x)] + data.table::setnames( + feat_coord, old = coord_oldnames, new = coord_newnames) + # neg probe info + neg_coord <- neg_coords_all[fov == as.numeric(x)] + data.table::setnames( + neg_coord, old = coord_oldnames, new = coord_newnames) + + + # build giotto object -------------------------------------- # + if (isTRUE(verbose)) message("Building subcellular giotto object...") + fov_subset <- createGiottoObjectSubcellular( + gpoints = list( + "rna" = feat_coord, + "neg_probe" = neg_coord + ), + gpolygons = list("cell" = cellLabel_dir), + polygon_mask_list_params = list( + mask_method = "guess", + flip_vertical = TRUE, + flip_horizontal = FALSE, + shift_horizontal_step = FALSE, + remove_background_polygon = remove_background_polygon, + background_algo = background_algo, + remove_unvalid_polygons = remove_unvalid_polygons + ), + instructions = instructions, + cores = cores + ) - feat_coord = feat_coords_all[fov == as.numeric(x)] - data.table::setnames(feat_coord, old = coord_oldnames, new = coord_newnames) - # neg probe info - neg_coord = neg_coords_all[fov == as.numeric(x)] - data.table::setnames(neg_coord, old = coord_oldnames, new = coord_newnames) + # find centroids as spatial locations ---------------------- # + if (isTRUE(verbose)) + message("Finding polygon centroids as cell spatial locations...") + fov_subset <- addSpatialCentroidLocations( + fov_subset, + poly_info = "cell", + spat_loc_name = "raw" + ) - # build giotto object -------------------------------------- # - if(isTRUE(verbose)) wrap_msg('Building subcellular giotto object...') - fov_subset = createGiottoObjectSubcellular( - gpoints = list('rna' = feat_coord, - 'neg_probe' = neg_coord), - gpolygons = list('cell' = cellLabel_dir), - polygon_mask_list_params = list( - mask_method = 'guess', - flip_vertical = TRUE, - flip_horizontal = FALSE, - shift_horizontal_step = FALSE, - remove_background_polygon = remove_background_polygon, - background_algo = background_algo, - remove_unvalid_polygons = remove_unvalid_polygons - ), - instructions = instructions, - cores = cores - ) + # create and add giotto image objects ---------------------- # + if (isTRUE(verbose)) { + message("Attaching image files...") + print(composite_dir) + print(cellOverlay_dir) + print(compartmentLabel_dir) + } - # find centroids as spatial locations ---------------------- # - if(isTRUE(verbose)) wrap_msg('Finding polygon centroids as cell spatial locations...') - fov_subset = addSpatialCentroidLocations( - fov_subset, - poly_info = 'cell', - spat_loc_name = 'raw' - ) + gImage_list <- list() + # load image if files are found + if (!is.null(composite_dir)) { + gImage_list$composite <- createGiottoLargeImage( + raster_object = composite_dir, + negative_y = FALSE, + name = "composite" + ) + } + if (!is.null(cellOverlay_dir)) { + gImage_list$overlay <- createGiottoLargeImage( + raster_object = cellOverlay_dir, + negative_y = FALSE, + name = "overlay" + ) + } + if (!is.null(compartmentLabel_dir)) { + gImage_list$compartment <- createGiottoLargeImage( + raster_object = compartmentLabel_dir, + negative_y = FALSE, + name = "compartment" + ) + } # TODO + + + + if (length(gImage_list) > 0) { + fov_subset <- addGiottoImage( + gobject = fov_subset, + images = gImage_list + ) + + # convert to MG for faster loading (particularly relevant for + # pulling from server) + # TODO remove this + fov_subset <- convertGiottoLargeImageToMG( + giottoLargeImage = gImage_list$composite, + gobject = fov_subset, + return_gobject = TRUE, + verbose = FALSE + ) + } else { + message("No images found for fov") + } + }) # lapply end - # create and add giotto image objects ---------------------- # - if(isTRUE(verbose)) { - message('Attaching image files...') - print(composite_dir) - print(cellOverlay_dir) - print(compartmentLabel_dir) - } + # returning -------------------------------------------------------------- # - gImage_list = list() - - # load image if files are found - if(!is.null(composite_dir)) - gImage_list$composite = createGiottoLargeImage( - raster_object = composite_dir, - negative_y = FALSE, - name = 'composite' - ) - if(!is.null(cellOverlay_dir)) - gImage_list$overlay = createGiottoLargeImage( - raster_object = cellOverlay_dir, - negative_y = FALSE, - name = 'overlay' - ) - if(!is.null(compartmentLabel_dir)) - gImage_list$compartment = createGiottoLargeImage( - raster_object = compartmentLabel_dir, - negative_y = FALSE, - name = 'compartment' - ) #TODO - - - - if(length(gImage_list) > 0) { - fov_subset = addGiottoImage( - gobject = fov_subset, - images = gImage_list - ) - - # convert to MG for faster loading (particularly relevant for pulling from server) - # TODO remove this - fov_subset = convertGiottoLargeImageToMG( - giottoLargeImage = gImage_list$composite, - gobject = fov_subset, - return_gobject = TRUE, - verbose = FALSE - ) - # fov_subset = convertGiottoLargeImageToMG(giottoLargeImage = gImage_list$overlay, gobject = fov_subset, return_gobject = TRUE) - # fov_subset = convertGiottoLargeImageToMG(giottoLargeImage = gImage_list$compartment, gobject = fov_subset, return_gobject = TRUE) + if (length(FOVs) == 1) { + return(fov_gobjects_list[[1]]) } else { - message('No images found for fov') + # join giotto objects according to FOV positions file + if (isTRUE(verbose)) message("Joining FOV gobjects...") + new_gobj_names <- paste0("fov", FOV_ID) + id_match <- match(as.numeric(FOV_ID), fov_offset_file$fov) + x_shifts <- fov_offset_file[id_match]$x_global_px + y_shifts <- fov_offset_file[id_match]$y_global_px + + # Join giotto objects + cosmx_gobject <- joinGiottoObjects( + gobject_list = fov_gobjects_list, + gobject_names = new_gobj_names, + join_method = "shift", + x_shift = x_shifts, + y_shift = y_shifts + ) + return(cosmx_gobject) } - - - }) #lapply end - - # returning -------------------------------------------------------------- # - - if(length(FOVs) == 1) { - return(fov_gobjects_list[[1]]) - } else { - # join giotto objects according to FOV positions file - if(isTRUE(verbose)) message('Joining FOV gobjects...') - new_gobj_names = paste0('fov', FOV_ID) - id_match = match(as.numeric(FOV_ID), fov_offset_file$fov) - x_shifts = fov_offset_file[id_match]$x_global_px - y_shifts = fov_offset_file[id_match]$y_global_px - - # Join giotto objects - cosmx_gobject = joinGiottoObjects( - gobject_list = fov_gobjects_list, - gobject_names = new_gobj_names, - join_method = 'shift', - x_shift = x_shifts, - y_shift = y_shifts - ) - return(cosmx_gobject) - } - } @@ -1387,184 +1436,209 @@ createGiottoCosMxObject = function(cosmx_dir = NULL, #' @name .createGiottoCosMxObject_aggregate #' @inheritParams createGiottoCosMxObject #' @keywords internal -.createGiottoCosMxObject_aggregate = function(dir_items, - cores, - verbose = TRUE, - instructions = NULL) { - - data_to_use = fov = NULL - - data_list = .load_cosmx_folder_aggregate(dir_items = dir_items, - cores = cores, - verbose = verbose) - - # unpack data_list - spatlocs = data_list$spatlocs - spatlocs_fov = data_list$spatlocs_fov - metadata = data_list$metadata - protM = data_list$protM - spM = data_list$spM - fov_shifts = data_list$fov_shifts - - - # create standard gobject from aggregate matrix - if(data_to_use == 'aggregate') { - - # Create aggregate gobject - if(isTRUE(verbose)) message('Building giotto object...') - cosmx_gobject = createGiottoObject(expression = list('raw' = spM, 'protein' = protM), - cell_metadata = list('cell' = list('rna' = metadata, - 'protein' = metadata)), - spatial_locs = spatlocs, - instructions = instructions, - cores = cores) - - - # load in images - img_ID = data.table::data.table(fov = fov_shifts[, fov], - img_name = paste0('fov', sprintf('%03d', fov_shifts[, fov]), '-image')) - - if(isTRUE(verbose)) message('Attaching image files...') - composite_dir = Sys.glob(paths = file.path(dir_items$`CellComposite folder`, paste0('/*'))) - cellLabel_dir = Sys.glob(paths = file.path(dir_items$`CellLabels folder`, paste0('/*'))) - compartmentLabel_dir = Sys.glob(paths = file.path(dir_items$`CompartmentLabels folder`, paste0('/*'))) - overlay_dir = Sys.glob(paths = file.path(dir_items$`CellOverlay folder`, paste0('/*'))) +.createGiottoCosMxObject_aggregate <- function(dir_items, + cores, + verbose = TRUE, + instructions = NULL) { + data_to_use <- fov <- NULL - if(length(cellLabel_imgList) > 0) cellLabel_imgList = lapply(cellLabel_dir, function(x) {createGiottoLargeImage(x,name = 'cellLabel',negative_y = TRUE)}) - if(length(composite_imgList) > 0) composite_imgList = lapply(composite_dir, function(x) {createGiottoLargeImage(x,name = 'composite',negative_y = TRUE)}) - if(length(compartmentLabel_dir) > 0) compartmentLabel_imgList = lapply(compartmentLabel_dir, function(x) {createGiottoLargeImage(x,name = 'composite',negative_y = TRUE)}) - if(length(overlay_dir) > 0) overlay_imgList = lapply(overlay_dir, function(x) {createGiottoLargeImage(x,name = 'composite',negative_y = TRUE)}) + data_list <- .load_cosmx_folder_aggregate( + dir_items = dir_items, + cores = cores, + verbose = verbose + ) + # unpack data_list + spatlocs <- data_list$spatlocs + spatlocs_fov <- data_list$spatlocs_fov + metadata <- data_list$metadata + protM <- data_list$protM + spM <- data_list$spM + fov_shifts <- data_list$fov_shifts + + + # create standard gobject from aggregate matrix + if (data_to_use == "aggregate") { + # Create aggregate gobject + if (isTRUE(verbose)) message("Building giotto object...") + cosmx_gobject <- createGiottoObject( + expression = list("raw" = spM, "protein" = protM), + cell_metadata = list("cell" = list( + "rna" = metadata, + "protein" = metadata + )), + spatial_locs = spatlocs, + instructions = instructions, + cores = cores + ) - } + # load in images + img_ID <- data.table::data.table( + fov = fov_shifts[, fov], + img_name = paste0("fov", + sprintf("%03d", fov_shifts[, fov]), "-image") + ) + if (isTRUE(verbose)) message("Attaching image files...") + composite_dir <- Sys.glob(paths = file.path( + dir_items$`CellComposite folder`, paste0("/*"))) + cellLabel_dir <- Sys.glob(paths = file.path( + dir_items$`CellLabels folder`, paste0("/*"))) + compartmentLabel_dir <- Sys.glob(paths = file.path( + dir_items$`CompartmentLabels folder`, paste0("/*"))) + overlay_dir <- Sys.glob(paths = file.path( + dir_items$`CellOverlay folder`, paste0("/*"))) + + if (length(cellLabel_imgList) > 0) { + cellLabel_imgList <- lapply(cellLabel_dir, function(x) { + createGiottoLargeImage(x, name = "cellLabel", negative_y = TRUE) + }) + } + if (length(composite_imgList) > 0) { + composite_imgList <- lapply(composite_dir, function(x) { + createGiottoLargeImage(x, name = "composite", negative_y = TRUE) + }) + } + if (length(compartmentLabel_dir) > 0) { + compartmentLabel_imgList <- lapply( + compartmentLabel_dir, function(x) { + createGiottoLargeImage(x, name = "composite", negative_y = TRUE) + }) + } + if (length(overlay_dir) > 0) { + overlay_imgList <- lapply(overlay_dir, function(x) { + createGiottoLargeImage(x, name = "composite", negative_y = TRUE) + }) + } + } } -#' @title Load and create a CosMx Giotto object from subcellular and aggregate info +#' @title Load and create a CosMx Giotto object from subcellular and aggregate +#' info #' @name .createGiottoCosMxObject_all #' @param dir_items list of full directory paths from \code{.read_cosmx_folder} #' @inheritParams createGiottoCosMxObject -#' @details Both \emph{subcellular} (subellular transcript detection information) and -#' \emph{aggregate} (aggregated detection count matrices by cell polygon from NanoString) +#' @details Both \emph{subcellular} +#' (subellular transcript detection information) and +#' \emph{aggregate} (aggregated detection count matrices by cell polygon from +#' NanoString) #' data will be loaded in. The two will be separated into 'cell' and 'cell_agg' #' spatial units in order to denote the difference in origin of the two. #' @seealso createGiottoCosMxObject .createGiottoCosMxObject_aggregate #' .createGiottoCosMxObject_subcellular #' @keywords internal -.createGiottoCosMxObject_all = function(dir_items, - FOVs, - remove_background_polygon = TRUE, - background_algo = c('range'), - remove_unvalid_polygons = TRUE, - cores, - verbose = TRUE, - instructions = NULL, - ...) { - - # 1. create subcellular giotto as spat_unit 'cell' - cosmx_gobject = .createGiottoCosMxObject_subcellular(dir_items = dir_items, - FOVs = FOVs, - remove_background_polygon = remove_background_polygon, - background_algo = background_algo, - remove_unvalid_polygons = remove_unvalid_polygons, - cores = cores, - verbose = verbose, - instructions = instructions) - - # 2. load and append aggregated information in spat_unit 'cell_agg' - agg_data = .load_cosmx_folder_aggregate(dir_items = dir_items, - cores = cores, - verbose = verbose) - - # unpack data_list - spatlocs = agg_data$spatlocs - spatlocs_fov = agg_data$spatlocs_fov - metadata = agg_data$metadata - protM = agg_data$protM - spM = agg_data$spM - - # add in pre-generated aggregated expression matrix information for 'all' workflow - - # Add aggregate expression information - if(isTRUE(verbose)) wrap_msg('Appending provided aggregate expression data as... +.createGiottoCosMxObject_all <- function(dir_items, + FOVs, + remove_background_polygon = TRUE, + background_algo = c("range"), + remove_unvalid_polygons = TRUE, + cores, + verbose = TRUE, + instructions = NULL, + ...) { + # 1. create subcellular giotto as spat_unit 'cell' + cosmx_gobject <- .createGiottoCosMxObject_subcellular( + dir_items = dir_items, + FOVs = FOVs, + remove_background_polygon = remove_background_polygon, + background_algo = background_algo, + remove_unvalid_polygons = remove_unvalid_polygons, + cores = cores, + verbose = verbose, + instructions = instructions + ) + + # 2. load and append aggregated information in spat_unit 'cell_agg' + agg_data <- .load_cosmx_folder_aggregate( + dir_items = dir_items, + cores = cores, + verbose = verbose + ) + + # unpack data_list + spatlocs <- agg_data$spatlocs + spatlocs_fov <- agg_data$spatlocs_fov + metadata <- agg_data$metadata + protM <- agg_data$protM + spM <- agg_data$spM + + # add in pre-generated aggregated expression matrix information for 'all' + # workflow + + # Add aggregate expression information + if (isTRUE(verbose)) wrap_msg( + 'Appending provided aggregate expression data as... spat_unit: "cell_agg" feat_type: "rna" name: "raw"') - # add expression data to expression slot - s4_expr = createExprObj( - name = 'raw', - expression_data = spM, - spat_unit = 'cell_agg', - feat_type = 'rna', - provenance = 'cell_agg' - ) - - cosmx_gobject = set_expression_values(cosmx_gobject, values = s4_expr) - - # Add spatial locations - if(isTRUE(verbose)) wrap_msg('Appending metadata provided spatial locations data as... + # add expression data to expression slot + s4_expr <- createExprObj( + name = "raw", + expression_data = spM, + spat_unit = "cell_agg", + feat_type = "rna", + provenance = "cell_agg" + ) + + cosmx_gobject <- set_expression_values(cosmx_gobject, values = s4_expr) + + # Add spatial locations + if (isTRUE(verbose)) wrap_msg( + 'Appending metadata provided spatial locations data as... --> spat_unit: "cell_agg" name: "raw" --> spat_unit: "cell" name: "raw_fov"') - if(isTRUE(verbose)) wrap_msg('Polygon centroid derived spatial locations assigned as... + if (isTRUE(verbose)) wrap_msg( + 'Polygon centroid derived spatial locations assigned as... --> spat_unit: "cell" name: "raw" (default)') - locsObj = create_spat_locs_obj(name = 'raw', - coordinates = spatlocs, - spat_unit = 'cell_agg', - provenance = 'cell_agg') - locsObj_fov = create_spat_locs_obj(name = 'raw_fov', - coordinates = spatlocs_fov, - spat_unit = 'cell_agg', - provenance = 'cell_agg') - - cosmx_gobject = set_spatial_locations(cosmx_gobject, spatlocs = locsObj) - cosmx_gobject = set_spatial_locations(cosmx_gobject, spatlocs = locsObj_fov) - - # cosmx_gobject = set_spatial_locations(cosmx_gobject, - # spat_unit = 'cell_agg', - # spat_loc_name = 'raw', - # spatlocs = spatlocs) - # cosmx_gobject = set_spatial_locations(cosmx_gobject, - # spat_unit = 'cell_agg', - # spat_loc_name = 'raw_fov', - # spatlocs = spatlocs_fov) - - # initialize cell and feat IDs and metadata slots for 'cell_agg' spat_unit - agg_cell_ID = colnames(s4_expr[]) - agg_feat_ID = rownames(s4_expr[]) - - sub_feat_ID <- featIDs(cosmx_gobject, feat_type = "rna") - feat_ID_new = unique(c(agg_feat_ID, sub_feat_ID)) - - # cosmx_gobject = set_cell_id(gobject = cosmx_gobject, - # spat_unit = 'cell_agg', - # cell_IDs = agg_cell_ID) - # cosmx_gobject = set_feat_id(gobject = cosmx_gobject, - # feat_type = 'rna', - # feat_IDs = feat_ID_new) - - # cell metadata - - # Add metadata to both the given and the poly spat_units - if(isTRUE(verbose)) message('Appending provided cell metadata...') - cosmx_gobject = addCellMetadata(cosmx_gobject, - spat_unit = 'cell', - feat_type = 'rna', - new_metadata = metadata, - by_column = TRUE, - column_cell_ID = 'cell_ID') - cosmx_gobject = addCellMetadata(cosmx_gobject, - spat_unit = 'cell_agg', - feat_type = 'rna', - new_metadata = metadata, - by_column = TRUE, - column_cell_ID = 'cell_ID') - - initialize(cosmx_gobject) + locsObj <- create_spat_locs_obj( + name = "raw", + coordinates = spatlocs, + spat_unit = "cell_agg", + provenance = "cell_agg" + ) + locsObj_fov <- create_spat_locs_obj( + name = "raw_fov", + coordinates = spatlocs_fov, + spat_unit = "cell_agg", + provenance = "cell_agg" + ) + + cosmx_gobject <- set_spatial_locations(cosmx_gobject, spatlocs = locsObj) + cosmx_gobject <- set_spatial_locations(cosmx_gobject, + spatlocs = locsObj_fov) + + # initialize cell and feat IDs and metadata slots for 'cell_agg' spat_unit + agg_cell_ID <- colnames(s4_expr[]) + agg_feat_ID <- rownames(s4_expr[]) + + sub_feat_ID <- featIDs(cosmx_gobject, feat_type = "rna") + feat_ID_new <- unique(c(agg_feat_ID, sub_feat_ID)) + + # cell metadata + + # Add metadata to both the given and the poly spat_units + if (isTRUE(verbose)) message("Appending provided cell metadata...") + cosmx_gobject <- addCellMetadata(cosmx_gobject, + spat_unit = "cell", + feat_type = "rna", + new_metadata = metadata, + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + cosmx_gobject <- addCellMetadata(cosmx_gobject, + spat_unit = "cell_agg", + feat_type = "rna", + new_metadata = metadata, + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + + initialize(cosmx_gobject) } @@ -1580,24 +1654,25 @@ createGiottoCosMxObject = function(cosmx_dir = NULL, #' @title Create 10x Xenium Giotto Object #' @name createGiottoXeniumObject -#' @description Given the path to a Xenium experiment output folder, creates a Giotto -#' object +#' @description Given the path to a Xenium experiment output folder, creates a +#' Giotto object #' @param xenium_dir full path to the exported xenium directory #' @param data_to_use which type(s) of expression data to build the gobject with #' (e.g. default: \strong{'subcellular'}, 'aggregate', or 'all') #' @param load_format files formats from which to load the data. Either `csv` or #' `parquet` currently supported. -#' @param h5_expression (boolean) whether to load cell_feature_matrix from .h5 file. -#' Default is \code{TRUE} +#' @param h5_expression (boolean) whether to load cell_feature_matrix from .h5 +#' file. Default is \code{TRUE} #' @param h5_gene_ids use gene symbols (default) or ensembl ids for the .h5 gene #' expression matrix -#' @param bounds_to_load vector of boundary information to load (e.g. \code{'cell'} +#' @param bounds_to_load vector of boundary information to load +#' (e.g. \code{'cell'} #' or \code{'nucleus'} by themselves or \code{c('cell', 'nucleus')} to load both #' at the same time.) -#' @param qv_threshold Minimum Phred-scaled quality score cutoff to be included as -#' a subcellular transcript detection (default = 20) -#' @param key_list (advanced) list of grep-based keywords to split the subcellular -#' feature detections by feature type. See details +#' @param qv_threshold Minimum Phred-scaled quality score cutoff to be included +#' as a subcellular transcript detection (default = 20) +#' @param key_list (advanced) list of grep-based keywords to split the +#' subcellular feature detections by feature type. See details #' @inheritParams get10Xmatrix #' @inheritParams GiottoClass::createGiottoObjectSubcellular #' @details @@ -1606,19 +1681,21 @@ createGiottoCosMxObject = function(cosmx_dir = NULL, #' Xenium provides info on feature detections that include more than only the #' Gene Expression specific probes. Additional probes for QC are included: #' \emph{blank codeword}, \emph{negative control codeword}, and -#' \emph{negative control probe}. These additional QC probes each occupy and are treated -#' as their own feature types so that they can largely remain independent of the -#' gene expression information. +#' \emph{negative control probe}. These additional QC probes each occupy and +#' are treated as their own feature types so that they can largely remain +#' independent of the gene expression information. #' #' [\strong{key_list}] #' Related to \code{data_to_use = 'subcellular'} workflow only: -#' Additional QC probe information is in the subcellular feature detections information -#' and must be separated from the gene expression information during processing. -#' The QC probes have prefixes that allow them to be selected from the rest of the -#' feature IDs. -#' Giotto uses a named list of keywords (\code{key_list}) to select these QC probes, -#' with the list names being the names that will be assigned as the feature type -#' of these feature detections. The default list is used when \code{key_list} = NULL. +#' Additional QC probe information is in the subcellular feature detections +#' information and must be separated from the gene expression information +#' during processing. +#' The QC probes have prefixes that allow them to be selected from the rest of +#' the feature IDs. +#' Giotto uses a named list of keywords (\code{key_list}) to select these QC +#' probes, with the list names being the names that will be assigned as the +#' feature type of these feature detections. The default list is used when +#' \code{key_list} = NULL. #' #' Default list: #' \preformatted{ @@ -1631,113 +1708,119 @@ createGiottoCosMxObject = function(cosmx_dir = NULL, #' map to any of the keys. #' #' @export -createGiottoXeniumObject = function(xenium_dir, - data_to_use = c('subcellular','aggregate'), - load_format = 'csv', - h5_expression = TRUE, - h5_gene_ids = c('symbols', 'ensembl'), - gene_column_index = 1, - bounds_to_load = c('cell'), - qv_threshold = 20, - key_list = NULL, - # include_analysis = FALSE, - instructions = NULL, - cores = NA, - verbose = TRUE) { - - # 0. setup - xenium_dir = path.expand(xenium_dir) - - # Determine data to load - data_to_use = match.arg(arg = data_to_use, choices = c('subcellular','aggregate')) - - # Determine load formats - load_format = 'csv' # TODO Remove this and add as param once other options are available - load_format = match.arg(arg = load_format, choices = c('csv', 'parquet', 'zarr')) - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - # 1. detect xenium folder and find filepaths to load - - # path_list contents: - # tx_path - # bound_paths - # cell_meta_path - # agg_expr_path - # panel_meta_path - path_list = .read_xenium_folder(xenium_dir = xenium_dir, - data_to_use = data_to_use, - bounds_to_load = bounds_to_load, - load_format = load_format, - h5_expression = h5_expression, - verbose = verbose) - - - # 2. load in data - - # data_list contents: - # feat_meta - # tx_dt - # bound_dt_list - # cell_meta - # agg_expr - data_list = .load_xenium_folder(path_list = path_list, - load_format = load_format, - data_to_use = data_to_use, - h5_expression = h5_expression, - h5_gene_ids = h5_gene_ids, - gene_column_index = gene_column_index, - cores = cores, - verbose = verbose) - - - # TODO load images - - - # 3. Create giotto objects - - if(data_to_use == 'subcellular') { - - # ** feat type search keys ** - if(is.null(key_list)) { - key_list = list(blank_code = 'BLANK_', - neg_code = 'NegControlCodeword_', - neg_probe = c('NegControlProbe_|antisense_')) - } +createGiottoXeniumObject <- function(xenium_dir, + data_to_use = c("subcellular", "aggregate"), + load_format = "csv", + h5_expression = TRUE, + h5_gene_ids = c("symbols", "ensembl"), + gene_column_index = 1, + bounds_to_load = c("cell"), + qv_threshold = 20, + key_list = NULL, + instructions = NULL, + cores = NA, + verbose = TRUE) { + # 0. setup + xenium_dir <- path.expand(xenium_dir) + + # Determine data to load + data_to_use <- match.arg( + arg = data_to_use, choices = c("subcellular", "aggregate")) + + # Determine load formats + load_format <- "csv" # TODO Remove this and add as param once other options + # are available + load_format <- match.arg( + arg = load_format, choices = c("csv", "parquet", "zarr")) + + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) + + # 1. detect xenium folder and find filepaths to load + + # path_list contents: + # tx_path + # bound_paths + # cell_meta_path + # agg_expr_path + # panel_meta_path + path_list <- .read_xenium_folder( + xenium_dir = xenium_dir, + data_to_use = data_to_use, + bounds_to_load = bounds_to_load, + load_format = load_format, + h5_expression = h5_expression, + verbose = verbose + ) - # needed: + + # 2. load in data + + # data_list contents: # feat_meta # tx_dt # bound_dt_list - xenium_gobject = .createGiottoXeniumObject_subcellular(data_list = data_list, - qv_threshold = qv_threshold, - key_list = key_list, - instructions = instructions, - cores = cores, - verbose = verbose) + # cell_meta + # agg_expr + data_list <- .load_xenium_folder( + path_list = path_list, + load_format = load_format, + data_to_use = data_to_use, + h5_expression = h5_expression, + h5_gene_ids = h5_gene_ids, + gene_column_index = gene_column_index, + cores = cores, + verbose = verbose + ) - } - if(data_to_use == 'aggregate') { + # TODO load images - # needed: - # feat_meta - # cell_meta - # agg_expr - # optional? - # tx_dt - # bound_dt_list - xenium_gobject = .createGiottoXeniumObject_aggregate(data_list = data_list, - instructions = instructions, - cores = cores, - verbose = verbose) - } + # 3. Create giotto objects + + if (data_to_use == "subcellular") { + # ** feat type search keys ** + if (is.null(key_list)) { + key_list <- list( + blank_code = "BLANK_", + neg_code = "NegControlCodeword_", + neg_probe = c("NegControlProbe_|antisense_") + ) + } + + # needed: + # feat_meta + # tx_dt + # bound_dt_list + xenium_gobject <- .createGiottoXeniumObject_subcellular( + data_list = data_list, + qv_threshold = qv_threshold, + key_list = key_list, + instructions = instructions, + cores = cores, + verbose = verbose + ) + } - return(xenium_gobject) + if (data_to_use == "aggregate") { + # needed: + # feat_meta + # cell_meta + # agg_expr + # optional? + # tx_dt + # bound_dt_list + xenium_gobject <- .createGiottoXeniumObject_aggregate( + data_list = data_list, + instructions = instructions, + cores = cores, + verbose = verbose + ) + } + return(xenium_gobject) } @@ -1749,96 +1832,90 @@ createGiottoXeniumObject = function(xenium_dir, #' @param data_list list of data loaded by \code{\link{.load_xenium_folder}} #' @param key_list regex-based search keys for feature IDs to allow separation #' into separate giottoPoints objects by feat_type -#' @param qv_threshold Minimum Phred-scaled quality score cutoff to be included as -#' a subcellular transcript detection (default = 20) +#' @param qv_threshold Minimum Phred-scaled quality score cutoff to be included +#' as a subcellular transcript detection (default = 20) #' @inheritParams get10Xmatrix #' @inheritParams GiottoClass::createGiottoObjectSubcellular #' @seealso createGiottoXeniumObject .createGiottoXeniumObject_aggregate #' @keywords internal -.createGiottoXeniumObject_subcellular = function(data_list, - key_list = NULL, - qv_threshold = 20, - instructions = NULL, - cores = NA, - verbose = TRUE) { - - # data.table vars - qv = NULL - - # Unpack data_list info - feat_meta = data_list$feat_meta - tx_dt = data_list$tx_dt - bound_dt_list = data_list$bound_dt_list - # cell_meta = data_list$cell_meta - # agg_expr = data_list$agg_expr - - # define for data.table - cell_id = feat_ID = feature_name = NULL - - vmsg('Building subcellular giotto object...', .v = verbose) - # Giotto points object - vmsg('> points data prep...', .v = verbose) - - # filter by qv_threshold - vmsg('> filtering feature detections for Phred score >= ', qv_threshold, .v = verbose) - n_before = tx_dt[,.N] - tx_dt_filtered = tx_dt[qv >= qv_threshold] - n_after = tx_dt_filtered[,.N] - - if(verbose) { - cat('Number of feature points removed: ', - n_before - n_after, - ' out of ', n_before, '\n') - } - - vmsg('> splitting detections by feat_type', .v = verbose) - # discover feat_IDs for each feat_type - all_IDs = tx_dt_filtered[, unique(feat_ID)] - feat_types_IDs = lapply(key_list, function(x) all_IDs[grepl(pattern = x, all_IDs)]) - rna = list('rna' = all_IDs[!all_IDs %in% unlist(feat_types_IDs)]) - feat_types_IDs = append(rna, feat_types_IDs) - - # separate detections by feature type - points_list = lapply( - feat_types_IDs, - function(types) { - tx_dt_filtered[feat_ID %in% types] - } - ) - - # Giotto polygons object - vmsg('> polygons data prep...', .v = verbose) - polys_list = lapply( - bound_dt_list, - function(bound_type) { - bound_type[, cell_id := as.character(cell_id)] +.createGiottoXeniumObject_subcellular <- function(data_list, + key_list = NULL, + qv_threshold = 20, + instructions = NULL, + cores = NA, + verbose = TRUE) { + # data.table vars + qv <- NULL + + # Unpack data_list info + feat_meta <- data_list$feat_meta + tx_dt <- data_list$tx_dt + bound_dt_list <- data_list$bound_dt_list + + # define for data.table + cell_id <- feat_ID <- feature_name <- NULL + + vmsg("Building subcellular giotto object...", .v = verbose) + # Giotto points object + vmsg("> points data prep...", .v = verbose) + + # filter by qv_threshold + vmsg("> filtering feature detections for Phred score >= ", + qv_threshold, .v = verbose) + n_before <- tx_dt[, .N] + tx_dt_filtered <- tx_dt[qv >= qv_threshold] + n_after <- tx_dt_filtered[, .N] + + if (verbose) { + cat( + "Number of feature points removed: ", + n_before - n_after, + " out of ", n_before, "\n" + ) } - ) - - xenium_gobject = createGiottoObjectSubcellular( - gpoints = points_list, - gpolygons = polys_list, - instructions = instructions, - cores = cores, - verbose = verbose - ) - - # generate centroids - vmsg('Calculating polygon centroids...', .v = verbose) - xenium_gobject = addSpatialCentroidLocations( - xenium_gobject, - poly_info = c(names(bound_dt_list)), - provenance = as.list(names(bound_dt_list)) - ) - - # add in feature metadata - # xenium_gobject = addFeatMetadata(gobject = xenium_gobject, - # new_metadata = feat_meta, - # by_column = TRUE, - # column_feat_ID = 'feat_ID') - - return(xenium_gobject) + vmsg("> splitting detections by feat_type", .v = verbose) + # discover feat_IDs for each feat_type + all_IDs <- tx_dt_filtered[, unique(feat_ID)] + feat_types_IDs <- lapply( + key_list, function(x) all_IDs[grepl(pattern = x, all_IDs)]) + rna <- list("rna" = all_IDs[!all_IDs %in% unlist(feat_types_IDs)]) + feat_types_IDs <- append(rna, feat_types_IDs) + + # separate detections by feature type + points_list <- lapply( + feat_types_IDs, + function(types) { + tx_dt_filtered[feat_ID %in% types] + } + ) + + # Giotto polygons object + vmsg("> polygons data prep...", .v = verbose) + polys_list <- lapply( + bound_dt_list, + function(bound_type) { + bound_type[, cell_id := as.character(cell_id)] + } + ) + + xenium_gobject <- createGiottoObjectSubcellular( + gpoints = points_list, + gpolygons = polys_list, + instructions = instructions, + cores = cores, + verbose = verbose + ) + + # generate centroids + vmsg("Calculating polygon centroids...", .v = verbose) + xenium_gobject <- addSpatialCentroidLocations( + xenium_gobject, + poly_info = c(names(bound_dt_list)), + provenance = as.list(names(bound_dt_list)) + ) + + return(xenium_gobject) } @@ -1853,56 +1930,58 @@ createGiottoXeniumObject = function(xenium_dir, #' @inheritParams GiottoClass::createGiottoObjectSubcellular #' @seealso createGiottoXeniumObject .createGiottoXeniumObject_subcellular #' @keywords internal -.createGiottoXeniumObject_aggregate = function(data_list, - # include_analysis = FALSE, - instructions = NULL, - cores = NA, - verbose = TRUE) { - - # Unpack data_list info - feat_meta = data_list$feat_meta - # tx_dt = data_list$tx_dt - # bound_dt_list = data_list$bound_dt_list - cell_meta = data_list$cell_meta - agg_expr = data_list$agg_expr - - # define for data.table - cell_ID = x_centroid = y_centroid = NULL - - # clean up names for aggregate matrices - names(agg_expr) = gsub(pattern = ' ', replacement = '_' ,names(agg_expr)) - geneExpMat = which(names(agg_expr) == 'Gene_Expression') - names(agg_expr)[[geneExpMat]] = 'raw' - - # set cell_id as character - cell_meta = cell_meta[, data.table::setnames(.SD, 'cell_id', 'cell_ID')] - cell_meta = cell_meta[, cell_ID := as.character(cell_ID)] - - # set up spatial locations - agg_spatlocs = cell_meta[, .(x_centroid, y_centroid, cell_ID)] - - # set up metadata - agg_meta = cell_meta[, !c('x_centroid','y_centroid')] - - vmsg('Building aggregate giotto object...', .v = verbose) - xenium_gobject = createGiottoObject(expression = agg_expr, - spatial_locs = agg_spatlocs, - instructions = instructions, - cores = cores, - verbose = verbose) - - # append aggregate metadata - xenium_gobject = addCellMetadata(gobject = xenium_gobject, - new_metadata = agg_meta, - by_column = TRUE, - column_cell_ID = 'cell_ID') - xenium_gobject = addFeatMetadata(gobject = xenium_gobject, - new_metadata = feat_meta, - by_column = TRUE, - column_feat_ID = 'feat_ID') - - return(xenium_gobject) +.createGiottoXeniumObject_aggregate <- function(data_list, + # include_analysis = FALSE, + instructions = NULL, + cores = NA, + verbose = TRUE) { + # Unpack data_list info + feat_meta <- data_list$feat_meta + cell_meta <- data_list$cell_meta + agg_expr <- data_list$agg_expr + # define for data.table + cell_ID <- x_centroid <- y_centroid <- NULL + + # clean up names for aggregate matrices + names(agg_expr) <- gsub(pattern = " ", replacement = "_", names(agg_expr)) + geneExpMat <- which(names(agg_expr) == "Gene_Expression") + names(agg_expr)[[geneExpMat]] <- "raw" + + # set cell_id as character + cell_meta <- cell_meta[, data.table::setnames(.SD, "cell_id", "cell_ID")] + cell_meta <- cell_meta[, cell_ID := as.character(cell_ID)] + + # set up spatial locations + agg_spatlocs <- cell_meta[, .(x_centroid, y_centroid, cell_ID)] + + # set up metadata + agg_meta <- cell_meta[, !c("x_centroid", "y_centroid")] + + vmsg("Building aggregate giotto object...", .v = verbose) + xenium_gobject <- createGiottoObject( + expression = agg_expr, + spatial_locs = agg_spatlocs, + instructions = instructions, + cores = cores, + verbose = verbose + ) + + # append aggregate metadata + xenium_gobject <- addCellMetadata( + gobject = xenium_gobject, + new_metadata = agg_meta, + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + xenium_gobject <- addFeatMetadata( + gobject = xenium_gobject, + new_metadata = feat_meta, + by_column = TRUE, + column_feat_ID = "feat_ID" + ) + + return(xenium_gobject) } @@ -1916,47 +1995,57 @@ createGiottoXeniumObject = function(xenium_dir, #' @describeIn read_data_folder Read a structured MERSCOPE folder #' @keywords internal - .read_merscope_folder <- function(merscope_dir, - data_to_use, - cores = NA, - verbose = NULL) { - - # prepare dir_items list - dir_items = list(`boundary info` = '*cell_boundaries*', - `image info` = '*images*', - `cell feature matrix` = '*cell_by_gene*', - `cell metadata` = '*cell_metadata*', - `raw transcript info` = '*transcripts*') - - # prepare require_data_DT - sub_reqs = data.table::data.table(workflow = c('subcellular'), - item = c('boundary info', - 'raw transcript info', - 'image info', - 'cell by gene matrix', - 'cell metadata'), - needed = c(TRUE, TRUE, FALSE, FALSE, FALSE)) - - agg_reqs = data.table::data.table(workflow = c('aggregate'), - item = c('boundary info', - 'raw transcript info', - 'image info', - 'cell by gene matrix', - 'cell metadata'), - needed = c(FALSE, FALSE, FALSE, TRUE, TRUE)) - - require_data_DT = rbind(sub_reqs, agg_reqs) - - dir_items = .read_data_folder(spat_method = 'MERSCOPE', - data_dir = merscope_dir, - dir_items = dir_items, - data_to_use = data_to_use, - require_data_DT = require_data_DT, - cores = cores, - verbose = verbose) - - return(dir_items) +.read_merscope_folder <- function(merscope_dir, + data_to_use, + cores = NA, + verbose = NULL) { + # prepare dir_items list + dir_items <- list( + `boundary info` = "*cell_boundaries*", + `image info` = "*images*", + `cell feature matrix` = "*cell_by_gene*", + `cell metadata` = "*cell_metadata*", + `raw transcript info` = "*transcripts*" + ) + # prepare require_data_DT + sub_reqs <- data.table::data.table( + workflow = c("subcellular"), + item = c( + "boundary info", + "raw transcript info", + "image info", + "cell by gene matrix", + "cell metadata" + ), + needed = c(TRUE, TRUE, FALSE, FALSE, FALSE) + ) + + agg_reqs <- data.table::data.table( + workflow = c("aggregate"), + item = c( + "boundary info", + "raw transcript info", + "image info", + "cell by gene matrix", + "cell metadata" + ), + needed = c(FALSE, FALSE, FALSE, TRUE, TRUE) + ) + + require_data_DT <- rbind(sub_reqs, agg_reqs) + + dir_items <- .read_data_folder( + spat_method = "MERSCOPE", + data_dir = merscope_dir, + dir_items = dir_items, + data_to_use = data_to_use, + require_data_DT = require_data_DT, + cores = cores, + verbose = verbose + ) + + return(dir_items) } @@ -1968,52 +2057,55 @@ createGiottoXeniumObject = function(xenium_dir, #' @return path_list a list of cosmx files discovered and their filepaths. NULL #' values denote missing items #' @keywords internal -.read_cosmx_folder = function(cosmx_dir, - verbose = TRUE) { - - ch = box_chars() - - if(is.null(cosmx_dir) | !dir.exists(cosmx_dir)) stop('The full path to a cosmx directory must be given.\n') - vmsg('A structured CosMx directory will be used\n', .v = verbose) - - # find directories (length = 1 if present, length = 0 if missing) - dir_items = list(`CellLabels folder` = '*CellLabels', - `CompartmentLabels folder` = '*CompartmentLabels', - `CellComposite folder` = '*CellComposite', - `CellOverlay folder` = '*CellOverlay', - `transcript locations file` = '*tx_file*', - `fov positions file` = '*fov_positions_file*', - `expression matrix file` = '*exprMat_file*', - `metadata file` = '*metadata_file*') - dir_items = lapply(dir_items, function(x) Sys.glob(paths = file.path(cosmx_dir, x))) - dir_items_lengths = lengths(dir_items) - - if(isTRUE(verbose)) { - message('Checking directory contents...') - for(item in names(dir_items)) { - if(dir_items_lengths[[item]] > 0) { - message(ch$s, '> ' ,item, ' found') - } else { - warning(item, ' is missing\n') - } +.read_cosmx_folder <- function(cosmx_dir, + verbose = TRUE) { + ch <- box_chars() + + if (is.null(cosmx_dir) | !dir.exists(cosmx_dir)) + stop("The full path to a cosmx directory must be given.") + vmsg("A structured CosMx directory will be used\n", .v = verbose) + + # find directories (length = 1 if present, length = 0 if missing) + dir_items <- list( + `CellLabels folder` = "*CellLabels", + `CompartmentLabels folder` = "*CompartmentLabels", + `CellComposite folder` = "*CellComposite", + `CellOverlay folder` = "*CellOverlay", + `transcript locations file` = "*tx_file*", + `fov positions file` = "*fov_positions_file*", + `expression matrix file` = "*exprMat_file*", + `metadata file` = "*metadata_file*" + ) + dir_items <- lapply( + dir_items, function(x) Sys.glob(paths = file.path(cosmx_dir, x))) + dir_items_lengths <- lengths(dir_items) + + if (isTRUE(verbose)) { + message("Checking directory contents...") + for (item in names(dir_items)) { + if (dir_items_lengths[[item]] > 0) { + message(ch$s, "> ", item, " found") + } else { + warning(item, " is missing\n") + } + } } - } - # select first directory in list if multiple are detected - if(any(dir_items_lengths > 1)) { - warning('Multiple matches for expected subdirectory item(s).\n First matching item selected') + # select first directory in list if multiple are detected + if (any(dir_items_lengths > 1)) { + warning("Multiple matches for expected subdirectory item(s).\n + First matching item selected") - multiples = which(dir_items_lengths > 1) - for(mult_i in multiples) { - message(names(dir_items)[[mult_i]], 'multiple matches found:') - print(dir_items[[mult_i]]) - dir_items[[mult_i]] = dir_items[[mult_i]][[1]] + multiples <- which(dir_items_lengths > 1) + for (mult_i in multiples) { + message(names(dir_items)[[mult_i]], "multiple matches found:") + print(dir_items[[mult_i]]) + dir_items[[mult_i]] <- dir_items[[mult_i]][[1]] + } } - } - vmsg('Directory check done', .v = verbose) - - return(dir_items) + vmsg("Directory check done", .v = verbose) + return(dir_items) } @@ -2025,137 +2117,167 @@ createGiottoXeniumObject = function(xenium_dir, #' @keywords internal #' @return path_list a list of xenium files discovered and their filepaths. NULL #' values denote missing items -.read_xenium_folder = function(xenium_dir, - data_to_use = 'subcellular', - bounds_to_load = c('cell'), - load_format = 'csv', - h5_expression = FALSE, - verbose = TRUE) { - - # Check needed packages - if(load_format == 'parquet') { - package_check(pkg_name = 'arrow', repository = 'CRAN') - package_check(pkg_name = 'dplyr', repository = 'CRAN') - } - if(isTRUE(h5_expression)) { - package_check(pkg_name = 'hdf5r', repository = 'CRAN') - } - - ch = box_chars() - - - # 0. test if folder structure exists and is as expected - - - if(is.null(xenium_dir) | !dir.exists(xenium_dir)) stop('The full path to a xenium directory must be given.\n') - vmsg('A structured Xenium directory will be used\n', .v = verbose) - - # find items (length = 1 if present, length = 0 if missing) - dir_items = list(`analysis info` = '*analysis*', - `boundary info` = '*bound*', - `cell feature matrix` = '*cell_feature_matrix*', - `cell metadata` = '*cells*', - `image info` = '*tif', - `panel metadata` = '*panel*', - `raw transcript info` = '*transcripts*', - `experiment info (.xenium)` = '*.xenium') - - dir_items = lapply(dir_items, function(x) Sys.glob(paths = file.path(xenium_dir, x))) - dir_items_lengths = lengths(dir_items) - - if(isTRUE(verbose)) { - message('Checking directory contents...') - for(item in names(dir_items)) { - - # IF ITEM FOUND - - if(dir_items_lengths[[item]] > 0) { - message(ch$s, '> ' ,item, ' found') - for(item_i in seq_along(dir_items[[item]])) { # print found item names - subItem = gsub(pattern = '.*/', replacement = '', x = dir_items[[item]][[item_i]]) - message(ch$s, ch$s, ch$l,ch$h,ch$h, subItem) +.read_xenium_folder <- function(xenium_dir, + data_to_use = "subcellular", + bounds_to_load = c("cell"), + load_format = "csv", + h5_expression = FALSE, + verbose = TRUE) { + # Check needed packages + if (load_format == "parquet") { + package_check(pkg_name = "arrow", repository = "CRAN") + package_check(pkg_name = "dplyr", repository = "CRAN") + } + if (isTRUE(h5_expression)) { + package_check(pkg_name = "hdf5r", repository = "CRAN") + } + + ch <- box_chars() + + + # 0. test if folder structure exists and is as expected + + + if (is.null(xenium_dir) | !dir.exists(xenium_dir)) + stop("The full path to a xenium directory must be given.") + vmsg("A structured Xenium directory will be used\n", .v = verbose) + + # find items (length = 1 if present, length = 0 if missing) + dir_items <- list( + `analysis info` = "*analysis*", + `boundary info` = "*bound*", + `cell feature matrix` = "*cell_feature_matrix*", + `cell metadata` = "*cells*", + `image info` = "*tif", + `panel metadata` = "*panel*", + `raw transcript info` = "*transcripts*", + `experiment info (.xenium)` = "*.xenium" + ) + + dir_items <- lapply( + dir_items, function(x) Sys.glob(paths = file.path(xenium_dir, x))) + dir_items_lengths <- lengths(dir_items) + + if (isTRUE(verbose)) { + message("Checking directory contents...") + for (item in names(dir_items)) { + # IF ITEM FOUND + + if (dir_items_lengths[[item]] > 0) { + message(ch$s, "> ", item, " found") + for (item_i in seq_along(dir_items[[item]])) { + # print found item names + subItem <- gsub(pattern = ".*/", replacement = "", + x = dir_items[[item]][[item_i]]) + message(ch$s, ch$s, ch$l, ch$h, ch$h, subItem) + } + } else { + # IF ITEM MISSING + # Based on workflow, determine if: + # necessary (error) + # optional (warning) + + if (data_to_use == "subcellular") { + # necessary items + if (item %in% c("boundary info", "raw transcript info")) + stop(item, " is missing") + # optional items + if (item %in% c( + "image info", "experiment info (.xenium)", + "panel metadata")) + warning(item, " is missing (optional)") + # items to ignore: analysis info, cell feature matrix, + # cell metadata + } else if (data_to_use == "aggregate") { + # necessary items + if (item %in% c("cell feature matrix", "cell metadata")) + stop(item, " is missing") + # optional items + if (item %in% c( + "image info", "experiment info (.xenium)", + "panel metadata", "analysis info")) + warning(item, " is missing (optional)") + # items to ignore: boundary info, raw transcript info + } + } + } + } + + + # 1. Select data to load + + + # **** transcript info **** + tx_path <- NULL + tx_path <- dir_items$`raw transcript info`[grepl( + pattern = load_format, dir_items$`raw transcript info`)] + # **** cell metadata **** + cell_meta_path <- NULL + cell_meta_path <- dir_items$`cell metadata`[grepl( + pattern = load_format, dir_items$`cell metadata`)] + + # **** boundary info **** + # Select bound load format + if (load_format != "zarr") { # No zarr available for boundary info + dir_items$`boundary info` <- dir_items$`boundary info`[grepl( + pattern = load_format, dir_items$`boundary info`)] + } else { + dir_items$`boundary info` <- dir_items$`boundary info`[grepl( + pattern = "csv", dir_items$`boundary info`)] + } + + # Organize bound paths by type of bound (bounds_to_load param) + bound_paths <- NULL + bound_names <- bounds_to_load + bounds_to_load <- as.list(bounds_to_load) + bound_paths <- lapply(bounds_to_load, function(x) dir_items$`boundary info`[ + grepl(pattern = x, dir_items$`boundary info`)]) + names(bound_paths) <- bound_names + + # **** aggregated expression info **** + agg_expr_path <- NULL + if (isTRUE(h5_expression)) { # h5 expression matrix loading is default + agg_expr_path <- dir_items$`cell feature matrix`[grepl( + pattern = "h5", dir_items$`cell feature matrix`)] + } else if (load_format == "zarr") { + agg_expr_path <- dir_items$`cell feature matrix`[grepl( + pattern = "zarr", dir_items$`cell feature matrix`)] + } else { # No parquet for aggregated expression - default to normal 10x loading + agg_expr_path <- dir_items$`cell feature matrix`[sapply( + dir_items$`cell feature matrix`, function(x) file_test(op = "-d", x))] + if (length(agg_expr_path) == 0) { + stop(wrap_txt( + "Expression matrix cannot be loaded.\n + Has cell_feature_matrix(.tar.gz) been unpacked into a + directory?" + )) } - } else { - - # IF ITEM MISSING - # Based on workflow, determine if: - # necessary (error) - # optional (warning) - - if(data_to_use == 'subcellular') { - # necessary items - if(item %in% c('boundary info', 'raw transcript info')) stop(item, ' is missing\n') - # optional items - if(item %in% c('image info', 'experiment info (.xenium)', 'panel metadata')) warning(item, ' is missing (optional)\n') - # items to ignore: analysis info, cell feature matrix, cell metadata - } else if(data_to_use == 'aggregate') { - # necessary items - if(item %in% c('cell feature matrix', 'cell metadata')) stop(item, ' is missing\n') - # optional items - if(item %in% c('image info', 'experiment info (.xenium)', 'panel metadata', 'analysis info')) warning(item, ' is missing (optional)\n') - # items to ignore: boundary info, raw transcript info + } + if (data_to_use == "aggregate") { + if (length(path_list$agg_expr_path) == 0) { + stop(wrap_txt( + "Aggregated expression not found.\n + Please confirm h5_expression and load_format params are correct" + )) } - } } - } - - - # 1. Select data to load - - - # **** transcript info **** - tx_path = NULL - tx_path = dir_items$`raw transcript info`[grepl(pattern = load_format, dir_items$`raw transcript info`)] - # **** cell metadata **** - cell_meta_path = NULL - cell_meta_path = dir_items$`cell metadata`[grepl(pattern = load_format, dir_items$`cell metadata`)] - - # **** boundary info **** - # Select bound load format - if(load_format != 'zarr') { # No zarr available for boundary info - dir_items$`boundary info` = dir_items$`boundary info`[grepl(pattern = load_format, dir_items$`boundary info`)] - } else dir_items$`boundary info` = dir_items$`boundary info`[grepl(pattern = 'csv', dir_items$`boundary info`)] - - # Organize bound paths by type of bound (bounds_to_load param) - bound_paths = NULL - bound_names = bounds_to_load - bounds_to_load = as.list(bounds_to_load) - bound_paths = lapply(bounds_to_load, function(x) dir_items$`boundary info`[grepl(pattern = x, dir_items$`boundary info`)]) - names(bound_paths) = bound_names - - # **** aggregated expression info **** - agg_expr_path = NULL - if(isTRUE(h5_expression)) { # h5 expression matrix loading is default - agg_expr_path = dir_items$`cell feature matrix`[grepl(pattern = 'h5', dir_items$`cell feature matrix`)] - } else if(load_format == 'zarr') { - agg_expr_path = dir_items$`cell feature matrix`[grepl(pattern = 'zarr', dir_items$`cell feature matrix`)] - } else { # No parquet for aggregated expression - default to normal 10x loading - agg_expr_path = dir_items$`cell feature matrix`[sapply(dir_items$`cell feature matrix`, function(x) file_test(op = '-d', x))] - if(length(agg_expr_path) == 0) stop(wrap_txt( - 'Expression matrix cannot be loaded.\nHas cell_feature_matrix(.tar.gz) been unpacked into a directory?' - )) - } - if(data_to_use == 'aggregate') { - if(length(path_list$agg_expr_path) == 0) stop(wrap_txt( - 'Aggregated expression not found.\nPlease confirm h5_expression and load_format params are correct\n' - )) - } - - # **** panel info **** - panel_meta_path = NULL - panel_meta_path = dir_items$`panel metadata` - - - vmsg('Directory check done', .v = verbose) - - path_list = list('tx_path' = tx_path, - 'bound_paths' = bound_paths, - 'cell_meta_path' = cell_meta_path, - 'agg_expr_path' = agg_expr_path, - 'panel_meta_path' = panel_meta_path) - - return(path_list) + # **** panel info **** + panel_meta_path <- NULL + panel_meta_path <- dir_items$`panel metadata` + + + vmsg("Directory check done", .v = verbose) + + path_list <- list( + "tx_path" = tx_path, + "bound_paths" = bound_paths, + "cell_meta_path" = cell_meta_path, + "agg_expr_path" = agg_expr_path, + "panel_meta_path" = panel_meta_path + ) + + return(path_list) } @@ -2171,137 +2293,150 @@ createGiottoXeniumObject = function(xenium_dir, #' @title Load MERSCOPE data from folder #' @name load_merscope_folder -#' @param dir_items list of full filepaths from \code{\link{.read_merscope_folder}} +#' @param dir_items list of full filepaths from +#' \code{\link{.read_merscope_folder}} #' @inheritParams createGiottoMerscopeObject #' @return list of loaded-in MERSCOPE data NULL #' @rdname load_merscope_folder #' @keywords internal -.load_merscope_folder = function(dir_items, - data_to_use, - fovs = NULL, - poly_z_indices = 1L:7L, - cores = NA, - verbose = TRUE) { - - # 1. load data_to_use-specific - if(data_to_use == 'subcellular') { - data_list = .load_merscope_folder_subcellular(dir_items = dir_items, - data_to_use = data_to_use, - fovs = fovs, - poly_z_indices = poly_z_indices, - cores = cores, - verbose = verbose) - } else if(data_to_use == 'aggregate') { - data_list = .load_merscope_folder_aggregate(dir_items = dir_items, - data_to_use = data_to_use, - cores = cores, - verbose = verbose) - } else { - stop(wrap_txt('data_to_use "', data_to_use, '" not implemented', sep = '')) - } - - # 2. Load images if available - if(!is.null(dir_items$`image info`)) { - ## micron to px scaling factor - micronToPixelScale = Sys.glob(paths = file.path(dir_items$`image info`, '*micron_to_mosaic_pixel_transform*'))[[1]] - micronToPixelScale = data.table::fread(micronToPixelScale, nThread = cores) - # add to data_list - data_list$micronToPixelScale = micronToPixelScale - - ## staining images - ## determine types of stains - images_filenames = list.files(dir_items$`image info`) - bound_stains_filenames = images_filenames[grep(pattern = '.tif', images_filenames)] - bound_stains_types = sapply(strsplit(bound_stains_filenames, '_'), `[`, 2) - bound_stains_types = unique(bound_stains_types) - - img_list = lapply_flex(bound_stains_types, function(stype) { - img_paths = Sys.glob(paths = file.path(dir_items$`image info`, paste0('*',stype,'*'))) - - lapply_flex(img_paths, function(img) { - createGiottoLargeImage(raster_object = img) - }, cores = cores) - }, cores = cores) - # add to data_list - data_list$images = img_list - } +.load_merscope_folder <- function(dir_items, + data_to_use, + fovs = NULL, + poly_z_indices = 1L:7L, + cores = NA, + verbose = TRUE) { + # 1. load data_to_use-specific + if (data_to_use == "subcellular") { + data_list <- .load_merscope_folder_subcellular( + dir_items = dir_items, + data_to_use = data_to_use, + fovs = fovs, + poly_z_indices = poly_z_indices, + cores = cores, + verbose = verbose + ) + } else if (data_to_use == "aggregate") { + data_list <- .load_merscope_folder_aggregate( + dir_items = dir_items, + data_to_use = data_to_use, + cores = cores, + verbose = verbose + ) + } else { + stop(wrap_txt('data_to_use "', data_to_use, + '" not implemented', sep = "")) + } + # 2. Load images if available + if (!is.null(dir_items$`image info`)) { + ## micron to px scaling factor + micronToPixelScale <- Sys.glob(paths = file.path( + dir_items$`image info`, "*micron_to_mosaic_pixel_transform*"))[[1]] + micronToPixelScale <- data.table::fread( + micronToPixelScale, nThread = cores) + # add to data_list + data_list$micronToPixelScale <- micronToPixelScale + + ## staining images + ## determine types of stains + images_filenames <- list.files(dir_items$`image info`) + bound_stains_filenames <- images_filenames[ + grep(pattern = ".tif", images_filenames)] + bound_stains_types <- sapply(strsplit( + bound_stains_filenames, "_"), `[`, 2) + bound_stains_types <- unique(bound_stains_types) + + img_list <- lapply_flex(bound_stains_types, function(stype) { + img_paths <- Sys.glob(paths = file.path( + dir_items$`image info`, paste0("*", stype, "*"))) + + lapply_flex(img_paths, function(img) { + createGiottoLargeImage(raster_object = img) + }, cores = cores) + }, cores = cores) + # add to data_list + data_list$images <- img_list + } - return(data_list) + return(data_list) } #' @describeIn load_merscope_folder Load items for 'subcellular' workflow #' @keywords internal -.load_merscope_folder_subcellular = function(dir_items, - data_to_use, - cores = NA, - poly_z_indices = 1L:7L, - verbose = TRUE, - fovs = NULL) { - - if(isTRUE(verbose)) wrap_msg('Loading transcript level info...') - if(is.null(fovs)) { - tx_dt = data.table::fread(dir_items$`raw transcript info`, nThread = cores) - } else { - vmsg('Selecting FOV subset transcripts') - tx_dt = fread_colmatch(file = dir_items$`raw transcript info`, - col = 'fov', - values_to_match = fovs, - verbose = FALSE, - nThread = cores) - } - tx_dt[, c('x','y') := NULL] # remove unneeded cols - data.table::setcolorder(tx_dt, c('gene', 'global_x', 'global_y', 'global_z')) - - if(isTRUE(verbose)) wrap_msg('Loading polygon info...') - poly_info = readPolygonFilesVizgenHDF5(boundaries_path = dir_items$`boundary info`, - z_indices = poly_z_indices, - flip_y_axis = TRUE, - fovs = fovs) - - data_list = list( - 'poly_info' = poly_info, - 'tx_dt' = tx_dt, - 'micronToPixelScale' = NULL, - 'expr_dt' = NULL, - 'cell_meta' = NULL, - 'images' = NULL - ) +.load_merscope_folder_subcellular <- function(dir_items, + data_to_use, + cores = NA, + poly_z_indices = 1L:7L, + verbose = TRUE, + fovs = NULL) { + if (isTRUE(verbose)) message("Loading transcript level info...") + if (is.null(fovs)) { + tx_dt <- data.table::fread( + dir_items$`raw transcript info`, nThread = cores) + } else { + message("Selecting FOV subset transcripts") + tx_dt <- fread_colmatch( + file = dir_items$`raw transcript info`, + col = "fov", + values_to_match = fovs, + verbose = FALSE, + nThread = cores + ) + } + tx_dt[, c("x", "y") := NULL] # remove unneeded cols + data.table::setcolorder( + tx_dt, c("gene", "global_x", "global_y", "global_z")) + + if (isTRUE(verbose)) message("Loading polygon info...") + poly_info <- readPolygonFilesVizgenHDF5( + boundaries_path = dir_items$`boundary info`, + z_indices = poly_z_indices, + flip_y_axis = TRUE, + fovs = fovs + ) + data_list <- list( + "poly_info" = poly_info, + "tx_dt" = tx_dt, + "micronToPixelScale" = NULL, + "expr_dt" = NULL, + "cell_meta" = NULL, + "images" = NULL + ) } #' @describeIn load_merscope_folder Load items for 'aggregate' workflow #' @keywords internal -.load_merscope_folder_aggregate = function(dir_items, - data_to_use, - cores = NA, - verbose = TRUE) { - - # metadata is polygon-related measurements - vmsg('Loading cell metadata...', .v = verbose) - cell_metadata_file = data.table::fread(dir_items$`cell metadata`, nThread = cores) - - vmsg('Loading expression matrix', .v = verbose) - expr_dt = data.table::fread(dir_items$`cell feature matrix`, nThread = cores) - - - data_list = list( - 'poly_info' = NULL, - 'tx_dt' = NULL, - 'micronToPixelScale' = NULL, - 'expr_dt' = expr_dt, - 'cell_meta' = cell_metadata_file, - 'images' = NULL - ) - +.load_merscope_folder_aggregate <- function(dir_items, + data_to_use, + cores = NA, + verbose = TRUE) { + # metadata is polygon-related measurements + vmsg("Loading cell metadata...", .v = verbose) + cell_metadata_file <- data.table::fread( + dir_items$`cell metadata`, nThread = cores) + + vmsg("Loading expression matrix", .v = verbose) + expr_dt <- data.table::fread( + dir_items$`cell feature matrix`, nThread = cores) + + + data_list <- list( + "poly_info" = NULL, + "tx_dt" = NULL, + "micronToPixelScale" = NULL, + "expr_dt" = expr_dt, + "cell_meta" = cell_metadata_file, + "images" = NULL + ) } @@ -2315,43 +2450,45 @@ NULL #' @title Load CosMx folder subcellular info #' @name .load_cosmx_folder_subcellular #' @description loads in the feature detections information. Note that the mask -#' images are still required for a working subcellular object, and those are loaded -#' in \code{\link{.createGiottoCosMxObject_subcellular}} +#' images are still required for a working subcellular object, and those are +#' loaded in \code{\link{.createGiottoCosMxObject_subcellular}} #' @inheritParams createGiottoCosMxObject #' @keywords internal -.load_cosmx_folder_subcellular = function(dir_items, - FOVs = NULL, - cores, - verbose = TRUE) { - - vmsg(.v = verbose, 'Loading subcellular information...') - - # subcellular checks - if(!file.exists(dir_items$`transcript locations file`)) - stop(wrap_txt('No transcript locations file (.csv) detected')) - if(!file.exists(dir_items$`fov positions file`)) - stop(wrap_txt('No fov positions file (.csv) detected')) +.load_cosmx_folder_subcellular <- function(dir_items, + FOVs = NULL, + cores, + verbose = TRUE) { + vmsg(.v = verbose, "Loading subcellular information...") - # FOVs to load - vmsg(.v = verbose, 'Loading FOV offsets...') - fov_offset_file = fread(input = dir_items$`fov positions file`, nThread = cores) - if(is.null(FOVs)) FOVs = fov_offset_file$fov # default to ALL FOVs - FOV_ID = as.list(sprintf('%03d', FOVs)) + # subcellular checks + if (!file.exists(dir_items$`transcript locations file`)) { + stop(wrap_txt("No transcript locations file (.csv) detected")) + } + if (!file.exists(dir_items$`fov positions file`)) { + stop(wrap_txt("No fov positions file (.csv) detected")) + } - #TODO Load only relevant portions of file? + # FOVs to load + vmsg(.v = verbose, "Loading FOV offsets...") + fov_offset_file <- fread( + input = dir_items$`fov positions file`, nThread = cores) + if (is.null(FOVs)) FOVs <- fov_offset_file$fov # default to ALL FOVs + FOV_ID <- as.list(sprintf("%03d", FOVs)) - vmsg(.v = verbose, 'Loading transcript level info...') - tx_coord_all = fread(input = dir_items$`transcript locations file`, nThread = cores) - vmsg(.v = verbose, 'Subcellular load done') + # TODO Load only relevant portions of file? - data_list = list( - 'FOV_ID' = FOV_ID, - 'fov_offset_file' = fov_offset_file, - 'tx_coord_all' = tx_coord_all - ) + vmsg(.v = verbose, "Loading transcript level info...") + tx_coord_all <- fread( + input = dir_items$`transcript locations file`, nThread = cores) + vmsg(.v = verbose, "Subcellular load done") - return(data_list) + data_list <- list( + "FOV_ID" = FOV_ID, + "fov_offset_file" = fov_offset_file, + "tx_coord_all" = tx_coord_all + ) + return(data_list) } @@ -2360,98 +2497,112 @@ NULL #' @name .load_cosmx_folder_aggregate #' @inheritParams createGiottoCosMxObject #' @keywords internal -.load_cosmx_folder_aggregate = function(dir_items, - cores, - verbose = TRUE) { - - # data.table vars - fov = cell_ID = fov_cell_ID = CenterX_global_px = CenterY_global_px = CenterX_local_px = - CenterY_local_px = x_shift = y_shift = NULL - - # load aggregate information - vmsg(.v = verbose, 'Loading provided aggregated information...') - - # aggregate checks - if(!file.exists(dir_items$`expression matrix file`)) stop(wrap_txt('No expression matrix file (.csv) detected')) - if(!file.exists(dir_items$`metadata file`)) stop(wrap_txt('No metadata file (.csv) detected. Needed for cell spatial locations.')) - - # read in aggregate data - expr_mat = fread(input = dir_items$`expression matrix file`, nThread = cores) - metadata = fread(input = dir_items$`metadata file`, nThread = cores) - - # setorder expression and spatlocs - data.table::setorder(metadata, fov, cell_ID) - data.table::setorder(expr_mat, fov, cell_ID) - - - # generate unique cell IDs - expr_mat[, cell_ID := paste0('fov', sprintf('%03d', fov), '-', 'cell_', cell_ID)] - # expr_mat$cell_ID = paste0('fov', sprintf('%03d', expr_mat$fov), '-', 'cell_', expr_mat$cell_ID) - expr_mat = expr_mat[, fov := NULL] - - metadata[, fov_cell_ID := cell_ID] - metadata[, cell_ID := paste0('fov', sprintf('%03d', fov), '-', 'cell_', cell_ID)] - # metadata$cell_ID = paste0('fov', sprintf('%03d', metadata$fov), '-', 'cell_', metadata$cell_ID) - # reorder - data.table::setcolorder(x = metadata, c('cell_ID','fov','fov_cell_ID')) - - - # extract spatial locations - spatlocs = metadata[,.(CenterX_global_px, CenterY_global_px, cell_ID)] - spatlocs_fov = metadata[,.(CenterX_local_px, CenterY_local_px, cell_ID)] - # regenerate FOV shifts - metadata[, x_shift := CenterX_global_px - CenterX_local_px] - metadata[, y_shift := CenterY_global_px - CenterY_local_px] - fov_shifts = metadata[, .(mean(x_shift), mean(y_shift)), fov] - colnames(fov_shifts) = c('fov', 'x_shift', 'y_shift') - - - # rename spatloc column names - spatloc_oldnames = c('CenterX_global_px', 'CenterY_global_px', 'cell_ID') - spatloc_oldnames_fov = c('CenterX_local_px', 'CenterY_local_px', 'cell_ID') - spatloc_newnames = c('sdimx', 'sdimy', 'cell_ID') - data.table::setnames(spatlocs, old = spatloc_oldnames, new = spatloc_newnames) - data.table::setnames(spatlocs_fov, old = spatloc_oldnames_fov, new = spatloc_newnames) - - # cleanup metadata and spatlocs - metadata = metadata[,c('CenterX_global_px', 'CenterY_global_px', 'CenterX_local_px', 'CenterY_local_px') := NULL] - # find unique cell_IDs present in both expression and metadata - giotto_cell_ID = unique(intersect(expr_mat$cell_ID, metadata$cell_ID)) - - # subset to only unique cell_IDs - expr_mat = expr_mat[cell_ID %in% giotto_cell_ID,] - metadata = metadata[cell_ID %in% giotto_cell_ID,] - - - # convert protein metadata to expr mat - # take all mean intensity protein information except for MembraneStain and DAPI - protein_meta_cols = colnames(metadata) - protein_meta_cols = protein_meta_cols[grepl(pattern = 'Mean.*', x = protein_meta_cols)] - protein_meta_cols = protein_meta_cols[!protein_meta_cols %in% c('Mean.MembraneStain', 'Mean.DAPI')] - protein_meta_cols = c('cell_ID', protein_meta_cols) - - prot_expr = metadata[, protein_meta_cols, with = FALSE] - prot_cell_ID = metadata[, cell_ID] - protM = Matrix::Matrix(as.matrix(prot_expr[,-1]), dimnames = list(prot_expr[[1]], colnames(prot_expr[,-1])), sparse = FALSE) - protM = t_flex(protM) - - # convert expression to sparse matrix - spM = Matrix::Matrix(as.matrix(expr_mat[,-1]), dimnames = list(expr_mat[[1]], colnames(expr_mat[,-1])), sparse = TRUE) - spM = t_flex(spM) - - ## Ready for downstream aggregate gobject creation or appending into existing subcellular Giotto object ## - - data_list = list( - 'spatlocs' = spatlocs, - 'spatlocs_fov' = spatlocs_fov, - 'metadata' = metadata, - 'protM' = protM, - 'spM' = spM, - 'fov_shifts' = fov_shifts - ) - - return(data_list) +.load_cosmx_folder_aggregate <- function(dir_items, + cores, + verbose = TRUE) { + # data.table vars + fov <- cell_ID <- fov_cell_ID <- CenterX_global_px <- + CenterY_global_px <- CenterX_local_px <- + CenterY_local_px <- x_shift <- y_shift <- NULL + + # load aggregate information + vmsg(.v = verbose, "Loading provided aggregated information...") + + # aggregate checks + if (!file.exists(dir_items$`expression matrix file`)) + stop(wrap_txt("No expression matrix file (.csv) detected")) + if (!file.exists(dir_items$`metadata file`)) + stop(wrap_txt("No metadata file (.csv) detected. Needed for cell + spatial locations.")) + + # read in aggregate data + expr_mat <- fread( + input = dir_items$`expression matrix file`, nThread = cores) + metadata <- fread(input = dir_items$`metadata file`, nThread = cores) + + # setorder expression and spatlocs + data.table::setorder(metadata, fov, cell_ID) + data.table::setorder(expr_mat, fov, cell_ID) + + + # generate unique cell IDs + expr_mat[, cell_ID := paste0( + "fov", sprintf("%03d", fov), "-", "cell_", cell_ID)] + expr_mat <- expr_mat[, fov := NULL] + + metadata[, fov_cell_ID := cell_ID] + metadata[, cell_ID := paste0( + "fov", sprintf("%03d", fov), "-", "cell_", cell_ID)] + # reorder + data.table::setcolorder(x = metadata, c("cell_ID", "fov", "fov_cell_ID")) + + + # extract spatial locations + spatlocs <- metadata[, .(CenterX_global_px, CenterY_global_px, cell_ID)] + spatlocs_fov <- metadata[, .(CenterX_local_px, CenterY_local_px, cell_ID)] + # regenerate FOV shifts + metadata[, x_shift := CenterX_global_px - CenterX_local_px] + metadata[, y_shift := CenterY_global_px - CenterY_local_px] + fov_shifts <- metadata[, .(mean(x_shift), mean(y_shift)), fov] + colnames(fov_shifts) <- c("fov", "x_shift", "y_shift") + + + # rename spatloc column names + spatloc_oldnames <- c("CenterX_global_px", "CenterY_global_px", "cell_ID") + spatloc_oldnames_fov <- c("CenterX_local_px", "CenterY_local_px", "cell_ID") + spatloc_newnames <- c("sdimx", "sdimy", "cell_ID") + data.table::setnames(spatlocs, old = spatloc_oldnames, new = spatloc_newnames) + data.table::setnames( + spatlocs_fov, old = spatloc_oldnames_fov, new = spatloc_newnames) + + # cleanup metadata and spatlocs + metadata <- metadata[, c("CenterX_global_px", "CenterY_global_px", + "CenterX_local_px", "CenterY_local_px") := NULL] + # find unique cell_IDs present in both expression and metadata + giotto_cell_ID <- unique(intersect(expr_mat$cell_ID, metadata$cell_ID)) + + # subset to only unique cell_IDs + expr_mat <- expr_mat[cell_ID %in% giotto_cell_ID, ] + metadata <- metadata[cell_ID %in% giotto_cell_ID, ] + + + # convert protein metadata to expr mat + # take all mean intensity protein information except for MembraneStain and DAPI + protein_meta_cols <- colnames(metadata) + protein_meta_cols <- protein_meta_cols[ + grepl(pattern = "Mean.*", x = protein_meta_cols)] + protein_meta_cols <- protein_meta_cols[ + !protein_meta_cols %in% c("Mean.MembraneStain", "Mean.DAPI")] + protein_meta_cols <- c("cell_ID", protein_meta_cols) + + prot_expr <- metadata[, protein_meta_cols, with = FALSE] + prot_cell_ID <- metadata[, cell_ID] + protM <- Matrix::Matrix(as.matrix(prot_expr[, -1]), + dimnames = list(prot_expr[[1]], + colnames(prot_expr[, -1])), + sparse = FALSE) + protM <- t_flex(protM) + + # convert expression to sparse matrix + spM <- Matrix::Matrix(as.matrix(expr_mat[, -1]), + dimnames = list(expr_mat[[1]], + colnames(expr_mat[, -1])), + sparse = TRUE) + spM <- t_flex(spM) + + ## Ready for downstream aggregate gobject creation or appending into + # existing subcellular Giotto object ## + + data_list <- list( + "spatlocs" = spatlocs, + "spatlocs_fov" = spatlocs_fov, + "metadata" = metadata, + "protM" = protM, + "spM" = spM, + "fov_shifts" = fov_shifts + ) + return(data_list) } @@ -2471,136 +2622,140 @@ NULL #' @rdname load_xenium_folder #' @keywords internal -.load_xenium_folder = function(path_list, - load_format = 'csv', - data_to_use = 'subcellular', - h5_expression = 'FALSE', - h5_gene_ids = 'symbols', - gene_column_index = 1, - cores, - verbose = TRUE) { - - data_list = switch( - load_format, - "csv" = .load_xenium_folder_csv( - path_list = path_list, - data_to_use = data_to_use, - h5_expression = h5_expression, - h5_gene_ids = h5_gene_ids, - gene_column_index = gene_column_index, - cores = cores, - verbose = verbose - ), - "parquet" = .load_xenium_folder_parquet( - path_list = path_list, - data_to_use = data_to_use, - h5_expression = h5_expression, - h5_gene_ids = h5_gene_ids, - gene_column_index = gene_column_index, - cores = cores, - verbose = verbose - ), - "zarr" = stop("load_format zarr:\n Not yet implemented", call. = FALSE) - ) +.load_xenium_folder <- function(path_list, + load_format = "csv", + data_to_use = "subcellular", + h5_expression = "FALSE", + h5_gene_ids = "symbols", + gene_column_index = 1, + cores, + verbose = TRUE) { + data_list <- switch(load_format, + "csv" = .load_xenium_folder_csv( + path_list = path_list, + data_to_use = data_to_use, + h5_expression = h5_expression, + h5_gene_ids = h5_gene_ids, + gene_column_index = gene_column_index, + cores = cores, + verbose = verbose + ), + "parquet" = .load_xenium_folder_parquet( + path_list = path_list, + data_to_use = data_to_use, + h5_expression = h5_expression, + h5_gene_ids = h5_gene_ids, + gene_column_index = gene_column_index, + cores = cores, + verbose = verbose + ), + "zarr" = stop("load_format zarr:\n Not yet implemented", call. = FALSE) + ) - return(data_list) + return(data_list) } #' @describeIn load_xenium_folder Load from csv files #' @keywords internal -.load_xenium_folder_csv = function(path_list, - cores, - data_to_use = 'subcellular', - h5_expression = FALSE, - h5_gene_ids = 'symbols', - gene_column_index = 1, - verbose = TRUE) { - - # initialize return vars - feat_meta = tx_dt = bound_dt_list = cell_meta = agg_expr = NULL - - vmsg("Loading feature metadata...", .v = verbose) - # updated for pipeline v1.6 json format - fdata_path <- path_list$panel_meta_path[[1]] - fdata_ext <- GiottoUtils::file_extension(fdata_path) - if ("json" %in% fdata_ext) { - feat_meta <- .load_xenium_panel_json(path = fdata_path, gene_ids = h5_gene_ids) - } else { - feat_meta <- data.table::fread(fdata_path, nThread = cores) - colnames(feat_meta)[[1]] <- 'feat_ID' - } - - # **** subcellular info **** - if(data_to_use == 'subcellular') { - # append missing QC probe info to feat_meta - if(isTRUE(h5_expression)) { - h5 = hdf5r::H5File$new(path_list$agg_expr_path) - tryCatch({ - root = names(h5) - feature_id = h5[[paste0(root, "/features/id")]][] - feature_info = h5[[paste0(root,"/features/feature_type")]][] - feature_names = h5[[paste0(root, "/features/name")]][] - features_dt = data.table::data.table( - 'id' = feature_id, - 'name' = feature_names, - 'feature_type' = feature_info - ) - }, finally = { - h5$close_all() - }) +.load_xenium_folder_csv <- function(path_list, + cores, + data_to_use = "subcellular", + h5_expression = FALSE, + h5_gene_ids = "symbols", + gene_column_index = 1, + verbose = TRUE) { + # initialize return vars + feat_meta <- tx_dt <- bound_dt_list <- cell_meta <- agg_expr <- NULL + + vmsg("Loading feature metadata...", .v = verbose) + # updated for pipeline v1.6 json format + fdata_path <- path_list$panel_meta_path[[1]] + fdata_ext <- GiottoUtils::file_extension(fdata_path) + if ("json" %in% fdata_ext) { + feat_meta <- .load_xenium_panel_json(path = fdata_path, + gene_ids = h5_gene_ids) } else { - features_dt <- data.table::fread( - paste0(path_list$agg_expr_path, '/features.tsv.gz'), - header = FALSE - ) + feat_meta <- data.table::fread(fdata_path, nThread = cores) + colnames(feat_meta)[[1]] <- "feat_ID" + } + + # **** subcellular info **** + if (data_to_use == "subcellular") { + # append missing QC probe info to feat_meta + if (isTRUE(h5_expression)) { + h5 <- hdf5r::H5File$new(path_list$agg_expr_path) + tryCatch({ + root <- names(h5) + feature_id <- h5[[paste0(root, "/features/id")]][] + feature_info <- h5[[paste0(root, "/features/feature_type")]][] + feature_names <- h5[[paste0(root, "/features/name")]][] + features_dt <- data.table::data.table( + "id" = feature_id, + "name" = feature_names, + "feature_type" = feature_info + ) + }, finally = { + h5$close_all() + }) + } else { + features_dt <- data.table::fread( + paste0(path_list$agg_expr_path, "/features.tsv.gz"), + header = FALSE + ) + } + colnames(features_dt) <- c("id", "feat_ID", "feat_class") + feat_meta <- merge( + features_dt[, c(2, 3)], feat_meta, all.x = TRUE, by = "feat_ID") + + GiottoUtils::vmsg("Loading transcript level info...", .v = verbose) + tx_dt <- data.table::fread(path_list$tx_path[[1]], nThread = cores) + data.table::setnames( + x = tx_dt, + old = c("feature_name", "x_location", "y_location"), + new = c("feat_ID", "x", "y") + ) + + GiottoUtils::vmsg("Loading boundary info...", .v = verbose) + bound_dt_list <- lapply( + path_list$bound_paths, + function(x) data.table::fread(x[[1]], nThread = cores) + ) } - colnames(features_dt) = c('id', 'feat_ID', 'feat_class') - feat_meta = merge(features_dt[,c(2,3)], feat_meta, all.x = TRUE, by = 'feat_ID') - - GiottoUtils::vmsg("Loading transcript level info...", .v = verbose) - tx_dt = data.table::fread(path_list$tx_path[[1]], nThread = cores) - data.table::setnames(x = tx_dt, - old = c('feature_name', 'x_location', 'y_location'), - new = c('feat_ID', 'x', 'y')) - - GiottoUtils::vmsg("Loading boundary info...", .v = verbose) - bound_dt_list = lapply( - path_list$bound_paths, - function(x) data.table::fread(x[[1]], nThread = cores) - ) - } - - # **** aggregate info **** - GiottoUtils::vmsg("loading cell metadata...", .v = verbose) - cell_meta = data.table::fread(path_list$cell_meta_path[[1]], nThread = cores) - - if(data_to_use == 'aggregate') { - GiottoUtils::vmsg("Loading aggregated expression...", .v = verbose) - if (isTRUE(h5_expression)) agg_expr = get10Xmatrix_h5( - path_to_data = path_list$agg_expr_path, - gene_ids = h5_gene_ids, - remove_zero_rows = TRUE, - split_by_type = TRUE - ) - else agg_expr = get10Xmatrix( - path_to_data = path_list$agg_expr_path, - gene_column_index = gene_column_index, - remove_zero_rows = TRUE, - split_by_type = TRUE - ) - } - data_list = list( - 'feat_meta' = feat_meta, - 'tx_dt' = tx_dt, - 'bound_dt_list' = bound_dt_list, - 'cell_meta' = cell_meta, - 'agg_expr' = agg_expr - ) + # **** aggregate info **** + GiottoUtils::vmsg("loading cell metadata...", .v = verbose) + cell_meta <- data.table::fread( + path_list$cell_meta_path[[1]], nThread = cores) + + if (data_to_use == "aggregate") { + GiottoUtils::vmsg("Loading aggregated expression...", .v = verbose) + if (isTRUE(h5_expression)) { + agg_expr <- get10Xmatrix_h5( + path_to_data = path_list$agg_expr_path, + gene_ids = h5_gene_ids, + remove_zero_rows = TRUE, + split_by_type = TRUE + ) + } else { + agg_expr <- get10Xmatrix( + path_to_data = path_list$agg_expr_path, + gene_column_index = gene_column_index, + remove_zero_rows = TRUE, + split_by_type = TRUE + ) + } + } - return(data_list) + data_list <- list( + "feat_meta" = feat_meta, + "tx_dt" = tx_dt, + "bound_dt_list" = bound_dt_list, + "cell_meta" = cell_meta, + "agg_expr" = agg_expr + ) + return(data_list) } @@ -2608,147 +2763,159 @@ NULL #' @describeIn load_xenium_folder Load from parquet files #' @keywords internal -.load_xenium_folder_parquet = function(path_list, - cores, - data_to_use = 'subcellular', - h5_expression = FALSE, - h5_gene_ids = 'symbols', - gene_column_index = 1, - verbose = TRUE) { - - # initialize return vars - feat_meta = tx_dt = bound_dt_list = cell_meta = agg_expr = NULL - # dplyr variable - cell_id = NULL - - vmsg("Loading feature metadata...", .v = verbose) - # updated for pipeline v1.6 json format - fdata_path <- path_list$panel_meta_path[[1]] - fdata_ext <- GiottoUtils::file_extension(fdata_path) - if ("json" %in% fdata_ext) { - feat_meta <- .load_xenium_panel_json(path = fdata_path, gene_ids = h5_gene_ids) - } else { - feat_meta <- data.table::fread(fdata_path, nThread = cores) - colnames(feat_meta)[[1]] <- 'feat_ID' - } - - # **** subcellular info **** - if(data_to_use == 'subcellular') { - - # define for data.table - transcript_id = feature_name = NULL - - # append missing QC probe info to feat_meta - if(isTRUE(h5_expression)) { - h5 = hdf5r::H5File$new(path_list$agg_expr_path) - tryCatch({ - root = names(h5) - feature_id = h5[[paste0(root, "/features/id")]][] - feature_info = h5[[paste0(root,"/features/feature_type")]][] - feature_names = h5[[paste0(root, "/features/name")]][] - features_dt = data.table::data.table( - 'id' = feature_id, - 'name' = feature_names, - 'feature_type' = feature_info - ) - }, finally = { - h5$close_all() - }) +.load_xenium_folder_parquet <- function(path_list, + cores, + data_to_use = "subcellular", + h5_expression = FALSE, + h5_gene_ids = "symbols", + gene_column_index = 1, + verbose = TRUE) { + # initialize return vars + feat_meta <- tx_dt <- bound_dt_list <- cell_meta <- agg_expr <- NULL + # dplyr variable + cell_id <- NULL + + vmsg("Loading feature metadata...", .v = verbose) + # updated for pipeline v1.6 json format + fdata_path <- path_list$panel_meta_path[[1]] + fdata_ext <- GiottoUtils::file_extension(fdata_path) + if ("json" %in% fdata_ext) { + feat_meta <- .load_xenium_panel_json( + path = fdata_path, gene_ids = h5_gene_ids) } else { - features_dt = arrow::read_tsv_arrow(paste0(path_list$agg_expr_path, '/features.tsv.gz'), - col_names = FALSE) %>% - data.table::setDT() + feat_meta <- data.table::fread(fdata_path, nThread = cores) + colnames(feat_meta)[[1]] <- "feat_ID" } - colnames(features_dt) = c('id', 'feat_ID', 'feat_class') - feat_meta = merge(features_dt[,c(2,3)], feat_meta, all.x = TRUE, by = 'feat_ID') - - vmsg('Loading transcript level info...', .v = verbose) - tx_dt = arrow::read_parquet(file = path_list$tx_path[[1]], - as_data_frame = FALSE) %>% - dplyr::mutate(transcript_id = cast(transcript_id, arrow::string())) %>% - dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% - dplyr::mutate(feature_name = cast(feature_name, arrow::string())) %>% - as.data.frame() %>% - data.table::setDT() - data.table::setnames(x = tx_dt, - old = c('feature_name', 'x_location', 'y_location'), - new = c('feat_ID', 'x', 'y')) - vmsg('Loading boundary info...', .v = verbose) - bound_dt_list = lapply(path_list$bound_paths, function(x) { - arrow::read_parquet(file = x[[1]], as_data_frame = FALSE) %>% - dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% - as.data.frame() %>% - data.table::setDT()}) - } - # **** aggregate info **** - if(data_to_use == 'aggregate') { - vmsg('Loading cell metadata...', .v = verbose) - cell_meta = arrow::read_parquet(file = path_list$cell_meta_path[[1]], - as_data_frame = FALSE) %>% - dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% - as.data.frame() %>% - data.table::setDT() - - # NOTE: no parquet for agg_expr. - vmsg('Loading aggregated expression...', .v = verbose) - if(isTRUE(h5_expression)) agg_expr = get10Xmatrix_h5( - path_to_data = path_list$agg_expr_path, - gene_ids = h5_gene_ids, - remove_zero_rows = TRUE, - split_by_type = TRUE - ) - else agg_expr = get10Xmatrix( - path_to_data = path_list$agg_expr_path, - gene_column_index = gene_column_index, - remove_zero_rows = TRUE, - split_by_type = TRUE - ) - } - data_list = list( - 'feat_meta' = feat_meta, - 'tx_dt' = tx_dt, - 'bound_dt_list' = bound_dt_list, - 'cell_meta' = cell_meta, - 'agg_expr' = agg_expr - ) + # **** subcellular info **** + if (data_to_use == "subcellular") { + # define for data.table + transcript_id <- feature_name <- NULL + + # append missing QC probe info to feat_meta + if (isTRUE(h5_expression)) { + h5 <- hdf5r::H5File$new(path_list$agg_expr_path) + tryCatch({ + root <- names(h5) + feature_id <- h5[[paste0(root, "/features/id")]][] + feature_info <- h5[[paste0(root, "/features/feature_type")]][] + feature_names <- h5[[paste0(root, "/features/name")]][] + features_dt <- data.table::data.table( + "id" = feature_id, + "name" = feature_names, + "feature_type" = feature_info + ) + }, finally = { + h5$close_all() + }) + } else { + features_dt <- arrow::read_tsv_arrow(paste0( + path_list$agg_expr_path, "/features.tsv.gz"), + col_names = FALSE + ) %>% + data.table::setDT() + } + colnames(features_dt) <- c("id", "feat_ID", "feat_class") + feat_meta <- merge(features_dt[ + , c(2, 3)], feat_meta, all.x = TRUE, by = "feat_ID") + + vmsg("Loading transcript level info...", .v = verbose) + tx_dt <- arrow::read_parquet( + file = path_list$tx_path[[1]], + as_data_frame = FALSE + ) %>% + dplyr::mutate( + transcript_id = cast(transcript_id, arrow::string())) %>% + dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% + dplyr::mutate( + feature_name = cast(feature_name, arrow::string())) %>% + as.data.frame() %>% + data.table::setDT() + data.table::setnames( + x = tx_dt, + old = c("feature_name", "x_location", "y_location"), + new = c("feat_ID", "x", "y") + ) + vmsg("Loading boundary info...", .v = verbose) + bound_dt_list <- lapply(path_list$bound_paths, function(x) { + arrow::read_parquet(file = x[[1]], as_data_frame = FALSE) %>% + dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% + as.data.frame() %>% + data.table::setDT() + }) + } + # **** aggregate info **** + if (data_to_use == "aggregate") { + vmsg("Loading cell metadata...", .v = verbose) + cell_meta <- arrow::read_parquet( + file = path_list$cell_meta_path[[1]], + as_data_frame = FALSE + ) %>% + dplyr::mutate(cell_id = cast(cell_id, arrow::string())) %>% + as.data.frame() %>% + data.table::setDT() + + # NOTE: no parquet for agg_expr. + vmsg("Loading aggregated expression...", .v = verbose) + if (isTRUE(h5_expression)) { + agg_expr <- get10Xmatrix_h5( + path_to_data = path_list$agg_expr_path, + gene_ids = h5_gene_ids, + remove_zero_rows = TRUE, + split_by_type = TRUE + ) + } else { + agg_expr <- get10Xmatrix( + path_to_data = path_list$agg_expr_path, + gene_column_index = gene_column_index, + remove_zero_rows = TRUE, + split_by_type = TRUE + ) + } + } - return(data_list) + data_list <- list( + "feat_meta" = feat_meta, + "tx_dt" = tx_dt, + "bound_dt_list" = bound_dt_list, + "cell_meta" = cell_meta, + "agg_expr" = agg_expr + ) + return(data_list) } .load_xenium_panel_json <- function(path, gene_ids = "symbols") { - gene_ids <- match.arg(gene_ids, c("symbols", "ensembl")) - - # tested on v1.6 - j <- jsonlite::fromJSON(path) - # j$metadata # dataset meta - # j$payload # main content - # j$payload$chemistry # panel chemistry used - # j$payload$customer # panel customer - # j$payload$designer # panel designer - # j$payload$spec_version # versioning - # j$payload$panel # dataset panel stats - - panel_info <- j$payload$targets$type %>% - data.table::as.data.table() - - switch( - gene_ids, - "symbols" = data.table::setnames( - panel_info, - old = c("data.id", "data.name", "descriptor"), - new = c("ensembl", "feat_ID", "type") - ), - "ensembl" = data.table::setnames( - panel_info, - old = c("data.id", "data.name", "descriptor"), - new = c("feat_ID", "symbol", "type") + gene_ids <- match.arg(gene_ids, c("symbols", "ensembl")) + + # tested on v1.6 + j <- jsonlite::fromJSON(path) + # j$metadata # dataset meta + # j$payload # main content + # j$payload$chemistry # panel chemistry used + # j$payload$customer # panel customer + # j$payload$designer # panel designer + # j$payload$spec_version # versioning + # j$payload$panel # dataset panel stats + + panel_info <- j$payload$targets$type %>% + data.table::as.data.table() + + switch(gene_ids, + "symbols" = data.table::setnames( + panel_info, + old = c("data.id", "data.name", "descriptor"), + new = c("ensembl", "feat_ID", "type") + ), + "ensembl" = data.table::setnames( + panel_info, + old = c("data.id", "data.name", "descriptor"), + new = c("feat_ID", "symbol", "type") + ) ) - ) - return(panel_info) + return(panel_info) } @@ -2763,73 +2930,90 @@ NULL #' (ii) fragment files, or (iii) bam files. #' @param genome A string indicating the default genome to be used for all ArchR #' functions. Currently supported values include "hg19","hg38","mm9", and "mm10". -#' This value is stored as a global environment variable, not part of the ArchRProject. +#' This value is stored as a global environment variable, not part of the +#' ArchRProject. #' This can be overwritten on a per-function basis using the given function's #' geneAnnotationand genomeAnnotation parameter. For something other than one of -#' the currently supported, see createGeneAnnnotation() and createGenomeAnnnotation() -#' @param createArrowFiles_params list of parameters passed to `ArchR::createArrowFiles` +#' the currently supported, see createGeneAnnnotation() and +#' createGenomeAnnnotation() +#' @param createArrowFiles_params list of parameters passed to +#' `ArchR::createArrowFiles` #' @param ArchRProject_params list of parameters passed to `ArchR::ArchRProject` -#' @param addIterativeLSI_params list of parameters passed to `ArchR::addIterativeLSI` +#' @param addIterativeLSI_params list of parameters passed to +#' `ArchR::addIterativeLSI` #' @param threads number of threads to use. Default = `ArchR::getArchRThreads()` #' @param force Default = FALSE #' @param verbose Default = TRUE #' -#' @return An ArchR project with GeneScoreMatrix, TileMatrix, and TileMatrix-based LSI +#' @return An ArchR project with GeneScoreMatrix, TileMatrix, and +#' TileMatrix-based LSI #' @export #' createArchRProj <- function(fragmentsPath, - genome = c('hg19', 'hg38', 'mm9', 'mm10'), - createArrowFiles_params = list(sampleNames = 'sample1', - minTSS = 0, - minFrags = 0, - maxFrags = 1e+07, - minFragSize = 10, - maxFragSize = 2000, - offsetPlus = 0, - offsetMinus = 0, - TileMatParams = list(tileSize = 5000)), - ArchRProject_params = list(outputDirectory = getwd(), - copyArrows = FALSE), - addIterativeLSI_params = list(), - threads = ArchR::getArchRThreads(), - force = FALSE, - verbose = TRUE) { - - if(!requireNamespace('ArchR')) { - wrap_msg('ArchR is needed. Install the package using remotes::install_github("GreenleafLab/ArchR")') - } - - ## Add reference genome - wrap_msg('Loading reference genome') - ArchR::addArchRGenome(genome) - - # Creating Arrow Files - wrap_msg('Creating Arrow files') - ArrowFiles <- do.call(ArchR::createArrowFiles, - c(inputFiles = fragmentsPath, - verbose = verbose, - force = force, - createArrowFiles_params) - ) - - # Creating an ArchRProject - wrap_msg('Creating ArchRProject') - proj <- do.call(ArchR::ArchRProject, - c(list(ArrowFiles = ArrowFiles), - threads = threads, - ArchRProject_params) - ) - - # Data normalization and dimensionality reduction - wrap_msg('Running dimension reduction') - proj <- do.call(ArchR::addIterativeLSI, - c(ArchRProj = proj, - verbose = verbose, - name = "IterativeLSI", - threads = threads, - force = force, - addIterativeLSI_params) - ) + genome = c("hg19", "hg38", "mm9", "mm10"), + createArrowFiles_params = list( + sampleNames = "sample1", + minTSS = 0, + minFrags = 0, + maxFrags = 1e+07, + minFragSize = 10, + maxFragSize = 2000, + offsetPlus = 0, + offsetMinus = 0, + TileMatParams = list(tileSize = 5000) + ), + ArchRProject_params = list( + outputDirectory = getwd(), + copyArrows = FALSE + ), + addIterativeLSI_params = list(), + threads = ArchR::getArchRThreads(), + force = FALSE, + verbose = TRUE) { + if (!requireNamespace("ArchR")) { + message('ArchR is needed. Install the package using + remotes::install_github("GreenleafLab/ArchR")') + } + + ## Add reference genome + message("Loading reference genome") + ArchR::addArchRGenome(genome) + + # Creating Arrow Files + message("Creating Arrow files") + ArrowFiles <- do.call( + ArchR::createArrowFiles, + c( + inputFiles = fragmentsPath, + verbose = verbose, + force = force, + createArrowFiles_params + ) + ) + + # Creating an ArchRProject + message("Creating ArchRProject") + proj <- do.call( + ArchR::ArchRProject, + c(list(ArrowFiles = ArrowFiles), + threads = threads, + ArchRProject_params + ) + ) + + # Data normalization and dimensionality reduction + message("Running dimension reduction") + proj <- do.call( + ArchR::addIterativeLSI, + c( + ArchRProj = proj, + verbose = verbose, + name = "IterativeLSI", + threads = threads, + force = force, + addIterativeLSI_params + ) + ) } #' Create a Giotto object from an ArchR project @@ -2837,8 +3021,10 @@ createArchRProj <- function(fragmentsPath, #' @param archRproj ArchR project #' @param expression expression information #' @param expression_feat Giotto object available features (e.g. atac, rna, ...) -#' @param spatial_locs data.table or data.frame with coordinates for cell centroids -#' @param sampleNames A character vector containing the ArchR project sample name +#' @param spatial_locs data.table or data.frame with coordinates for cell +#' centroids +#' @param sampleNames A character vector containing the ArchR project sample +#' name #' @param ... additional arguments passed to `createGiottoObject` #' #' @return A Giotto object with at least an atac or epigenetic modality @@ -2846,96 +3032,109 @@ createArchRProj <- function(fragmentsPath, #' @export #' createGiottoObjectfromArchR <- function(archRproj, - expression = NULL, - expression_feat = 'atac', - spatial_locs = NULL, - sampleNames = 'sample1', - ...) { - # extract GeneScoreMatrix - GeneScoreMatrix_summarizedExperiment = ArchR::getMatrixFromProject(archRproj) - GeneScoreMatrix = slot(slot(GeneScoreMatrix_summarizedExperiment, 'assays'), 'data')[['GeneScoreMatrix']] - - ## get cell names - cell_names = colnames(GeneScoreMatrix) - cell_names = gsub(paste0(sampleNames,'#'),'',cell_names) - cell_names = gsub('-1','',cell_names) - - ## get gene names - gene_names = slot(GeneScoreMatrix_summarizedExperiment,'elementMetadata')[['name']] - - ## replace colnames with cell names - colnames(GeneScoreMatrix) = cell_names - - ## replace rownames with gene names - rownames(GeneScoreMatrix) = gene_names - - - if(!is.null(expression)) { - expression_matrix = data.table::fread(expression) - - expression_cell_names = colnames(expression_matrix) - cell_names = intersect(cell_names, expression_cell_names) - - expression_matrix = Matrix::Matrix(as.matrix(expression_matrix[,-1]), - dimnames = list(expression_matrix[[1]], - colnames(expression_matrix[,-1])), - sparse = T) - - expression = expression_matrix[, cell_names] - - GeneScoreMatrix = GeneScoreMatrix[, cell_names] - } - - - ## filter spatial locations - if(!is.null(spatial_locs)) { - x = read.csv(spatial_locs) - x = x[x$cell_ID %in% cell_names,] - spatial_locs = x - } - - # Creating GiottoObject - wrap_msg('Creating GiottoObject') - - if(!is.null(expression)) { - gobject <- createGiottoObject(expression = list(GeneScoreMatrix = GeneScoreMatrix, - raw = expression), - expression_feat = expression_feat, - spatial_locs = spatial_locs, - ...) - } else { - gobject <- createGiottoObject(expression = list(GeneScoreMatrix = GeneScoreMatrix), - expression_feat = expression_feat, - spatial_locs = spatial_locs, - ...) - } - - # add LSI dimension reduction - coordinates = slot(archRproj,'reducedDims')[['IterativeLSI']][['matSVD']] - - ## clean cell names - lsi_cell_names = rownames(coordinates) - lsi_cell_names = gsub(paste0(sampleNames,'#'),'',lsi_cell_names) - lsi_cell_names = gsub('-1','',lsi_cell_names) - - rownames(coordinates) = lsi_cell_names - - coordinates = coordinates[cell_names,] - - dimension_reduction = Giotto::createDimObj(coordinates = coordinates, - name = 'lsi', - spat_unit = 'cell', - feat_type = expression_feat[1], - method = 'lsi') - gobject <- setDimReduction(gobject, - dimension_reduction, - spat_unit = 'cell', - feat_type = expression_feat[1], - name = 'lsi', - reduction_method = 'lsi') - - return(gobject) -} + expression = NULL, + expression_feat = "atac", + spatial_locs = NULL, + sampleNames = "sample1", + ...) { + # extract GeneScoreMatrix + GeneScoreMatrix_summarizedExperiment <- ArchR::getMatrixFromProject( + archRproj) + GeneScoreMatrix <- slot(slot( + GeneScoreMatrix_summarizedExperiment, "assays"), + "data")[["GeneScoreMatrix"]] + + ## get cell names + cell_names <- colnames(GeneScoreMatrix) + cell_names <- gsub(paste0(sampleNames, "#"), "", cell_names) + cell_names <- gsub("-1", "", cell_names) + + ## get gene names + gene_names <- slot(GeneScoreMatrix_summarizedExperiment, + "elementMetadata")[["name"]] + + ## replace colnames with cell names + colnames(GeneScoreMatrix) <- cell_names + + ## replace rownames with gene names + rownames(GeneScoreMatrix) <- gene_names + + + if (!is.null(expression)) { + expression_matrix <- data.table::fread(expression) + + expression_cell_names <- colnames(expression_matrix) + cell_names <- intersect(cell_names, expression_cell_names) + + expression_matrix <- Matrix::Matrix(as.matrix(expression_matrix[, -1]), + dimnames = list( + expression_matrix[[1]], + colnames(expression_matrix[, -1]) + ), + sparse = TRUE + ) + + expression <- expression_matrix[, cell_names] + + GeneScoreMatrix <- GeneScoreMatrix[, cell_names] + } + ## filter spatial locations + if (!is.null(spatial_locs)) { + x <- read.csv(spatial_locs) + x <- x[x$cell_ID %in% cell_names, ] + spatial_locs <- x + } + # Creating GiottoObject + message("Creating GiottoObject") + + if (!is.null(expression)) { + gobject <- createGiottoObject( + expression = list( + GeneScoreMatrix = GeneScoreMatrix, + raw = expression + ), + expression_feat = expression_feat, + spatial_locs = spatial_locs, + ... + ) + } else { + gobject <- createGiottoObject( + expression = list(GeneScoreMatrix = GeneScoreMatrix), + expression_feat = expression_feat, + spatial_locs = spatial_locs, + ... + ) + } + + # add LSI dimension reduction + coordinates <- slot(archRproj, "reducedDims")[["IterativeLSI"]][["matSVD"]] + + ## clean cell names + lsi_cell_names <- rownames(coordinates) + lsi_cell_names <- gsub(paste0(sampleNames, "#"), "", lsi_cell_names) + lsi_cell_names <- gsub("-1", "", lsi_cell_names) + + rownames(coordinates) <- lsi_cell_names + + coordinates <- coordinates[cell_names, ] + + dimension_reduction <- Giotto::createDimObj( + coordinates = coordinates, + name = "lsi", + spat_unit = "cell", + feat_type = expression_feat[1], + method = "lsi" + ) + gobject <- setDimReduction(gobject, + dimension_reduction, + spat_unit = "cell", + feat_type = expression_feat[1], + name = "lsi", + reduction_method = "lsi" + ) + + return(gobject) +} diff --git a/R/cross_section.R b/R/cross_section.R index 10ec0fd70..560a6b2fb 100644 --- a/R/cross_section.R +++ b/R/cross_section.R @@ -8,44 +8,55 @@ #' @description create a crossSection object #' @param name name of cross section object. (default = cross_section) #' @param method method to define the cross section plane. -#' @param thickness_unit unit of the virtual section thickness. If "cell", average size of the observed cells is used as length unit. If "natural", the unit of cell location coordinates is used.(default = cell) +#' @param thickness_unit unit of the virtual section thickness. If "cell", +#' average size of the observed cells is used as length unit. If "natural", +#' the unit of cell location coordinates is used.(default = cell) #' @param slice_thickness thickness of slice -#' @param cell_distance_estimate_method method to estimate average distance between neighobring cells. (default = mean) -#' @param extend_ratio deciding the span of the cross section meshgrid, as a ratio of extension compared to the borders of the virtual tissue section. (default = 0.2) -#' @param plane_equation a numerical vector of length 4, in the form of c(A,B,C,D), which defines plane Ax+By+Cz=D. -#' @param mesh_grid_n number of meshgrid lines to generate along both directions for the cross section plane. +#' @param cell_distance_estimate_method method to estimate average distance +#' between neighboring cells. (default = mean) +#' @param extend_ratio deciding the span of the cross section meshgrid, as a +#' ratio of extension compared to the borders of the virtual tissue section. +#' (default = 0.2) +#' @param plane_equation a numerical vector of length 4, in the form of +#' c(A,B,C,D), which defines plane Ax+By+Cz=D. +#' @param mesh_grid_n number of meshgrid lines to generate along both +#' directions for the cross section plane. #' @param mesh_obj object that stores the cross section meshgrid information. #' @param cell_subset cells selected by the cross section -#' @param cell_subset_spatial_locations locations of cells selected by the cross section -#' @param cell_subset_projection_locations 3D projection coordinates of selected cells onto the cross section plane +#' @param cell_subset_spatial_locations locations of cells selected by the +#' cross section +#' @param cell_subset_projection_locations 3D projection coordinates of +#' selected cells onto the cross section plane #' @param cell_subset_projection_PCA pca of projection coordinates -#' @param cell_subset_projection_coords 2D PCA coordinates of selected cells in the cross section plane -create_crossSection_object <- function(name=NULL, - method=NULL, - thickness_unit=NULL, - slice_thickness=NULL, - cell_distance_estimate_method=NULL, - extend_ratio=NULL, - plane_equation=NULL, - mesh_grid_n=NULL, - mesh_obj=NULL, - cell_subset=NULL, - cell_subset_spatial_locations=NULL, - cell_subset_projection_locations=NULL, - cell_subset_projection_PCA=NULL, - cell_subset_projection_coords=NULL){ - - crossSection_obj = list("method"=method, - "thickness_unit"=thickness_unit, - "slice_thickness" = slice_thickness, - "plane_equation"=plane_equation, - "mesh_grid_n"=mesh_grid_n, - "mesh_obj"=mesh_obj, - "cell_subset"=cell_subset, - "cell_subset_spatial_locations"=cell_subset_spatial_locations, - "cell_subset_projection_locations"=cell_subset_projection_locations, - "cell_subset_projection_PCA"=cell_subset_projection_PCA, - "cell_subset_projection_coords"=cell_subset_projection_coords) +#' @param cell_subset_projection_coords 2D PCA coordinates of selected cells +#' in the cross section plane +create_crossSection_object <- function(name = NULL, + method = NULL, + thickness_unit = NULL, + slice_thickness = NULL, + cell_distance_estimate_method = NULL, + extend_ratio = NULL, + plane_equation = NULL, + mesh_grid_n = NULL, + mesh_obj = NULL, + cell_subset = NULL, + cell_subset_spatial_locations = NULL, + cell_subset_projection_locations = NULL, + cell_subset_projection_PCA = NULL, + cell_subset_projection_coords = NULL) { + crossSection_obj <- list( + "method" = method, + "thickness_unit" = thickness_unit, + "slice_thickness" = slice_thickness, + "plane_equation" = plane_equation, + "mesh_grid_n" = mesh_grid_n, + "mesh_obj" = mesh_obj, + "cell_subset" = cell_subset, + "cell_subset_spatial_locations" = cell_subset_spatial_locations, + "cell_subset_projection_locations" = cell_subset_projection_locations, + "cell_subset_projection_PCA" = cell_subset_projection_PCA, + "cell_subset_projection_coords" = cell_subset_projection_coords + ) } #' @title read_crossSection @@ -56,30 +67,38 @@ create_crossSection_object <- function(name=NULL, #' @param spatial_network_name spatial_network_name #' @keywords internal read_crossSection <- function(gobject, - name = NULL, - spatial_network_name = NULL){ - if(is.null(spatial_network_name)){ - stop("spatial_network_name is not specified.") - }else if (!is.element(spatial_network_name, names(slot(gobject, 'spatial_network')))){ - stop(paste0(spatial_network_name, " has not been created.")) - }else { - sp_network_obj = get_spatialNetwork(gobject, - name = spatial_network_name, - output = 'spatialNetworkObj') - if (length(slot(sp_network_obj, 'crossSectionObjects'))==0){ - stop("No cross section object has been created.") - }else if (is.null(name)){ - sprintf("cross section object is not specified, reading the last one %s from the existing list", - names(slot(sp_network_obj, 'crossSectionObjects'))[length(slot(sp_network_obj, 'crossSectionObjects'))]) - crossSection_obj = slot(sp_network_obj, 'crossSectionObjects')[[length(slot(sp_network_obj, 'crossSectionObjects'))]] - }else if(!is.element(name,names(slot(sp_network_obj, 'crossSectionObjects')))){ - stop(paste0(name, " has not been created.")) - } - else{ - crossSection_obj = slot(sp_network_obj, 'crossSectionObjects')[[name]] + name = NULL, + spatial_network_name = NULL) { + if (is.null(spatial_network_name)) { + stop("spatial_network_name is not specified.") + } else if (!is.element( + spatial_network_name, names(slot(gobject, "spatial_network")))) { + stop(paste0(spatial_network_name, " has not been created.")) + } else { + sp_network_obj <- get_spatialNetwork(gobject, + name = spatial_network_name, + output = "spatialNetworkObj" + ) + if (length(slot(sp_network_obj, "crossSectionObjects")) == 0) { + stop("No cross section object has been created.") + } else if (is.null(name)) { + sprintf( + "cross section object is not specified, reading the last one %s + from the existing list", + names(slot(sp_network_obj, "crossSectionObjects"))[ + length(slot(sp_network_obj, "crossSectionObjects"))] + ) + crossSection_obj <- slot(sp_network_obj, "crossSectionObjects")[[ + length(slot(sp_network_obj, "crossSectionObjects"))]] + } else if (!is.element(name, names(slot( + sp_network_obj, "crossSectionObjects")))) { + stop(paste0(name, " has not been created.")) + } else { + crossSection_obj <- slot( + sp_network_obj, "crossSectionObjects")[[name]] + } } - } - return(crossSection_obj) + return(crossSection_obj) } @@ -92,16 +111,16 @@ read_crossSection <- function(gobject, #' @param method method #' @keywords internal estimateCellCellDistance <- function(gobject, - spatial_network_name="Delaunay_network", - method=c("mean","median") - ){ - - delaunay_network_DT = gobject@spatial_network[['thickness_unit']][[spatial_network_name]]@networkDT - - CellCellDistance = get_distance(networkDT= delaunay_network_DT, - method=method) - return(CellCellDistance) + spatial_network_name = "Delaunay_network", + method = c("mean", "median")) { + delaunay_network_DT <- gobject@spatial_network[["thickness_unit"]][[ + spatial_network_name]]@networkDT + CellCellDistance <- get_distance( + networkDT = delaunay_network_DT, + method = method + ) + return(CellCellDistance) } #' @title get_sectionThickness #' @name get_sectionThickness @@ -112,23 +131,23 @@ estimateCellCellDistance <- function(gobject, #' @param cell_distance_estimate_method cell_distance_estimate_method #' @param plane_equation plane_equation #' @keywords internal -get_sectionThickness <- function(gobject,thickness_unit=c("cell","natural"), - slice_thickness = 2, - spatial_network_name="Delaunay_network", - cell_distance_estimate_method = c("mean","median"), - plane_equation=NULL){ - - thickness_unit = match.arg(thickness_unit, c("cell", "natural")) - - if (thickness_unit == "cell"){ - CellCellDistance = estimateCellCellDistance(gobject, - method = cell_distance_estimate_method, - spatial_network_name = spatial_network_name) - sectionThickness = CellCellDistance*slice_thickness - }else if (thickness_unit=="natural"){ - sectionThickness = slice_thickness - } - return(sectionThickness) +get_sectionThickness <- function(gobject, thickness_unit = c("cell", "natural"), + slice_thickness = 2, + spatial_network_name = "Delaunay_network", + cell_distance_estimate_method = c("mean", "median"), + plane_equation = NULL) { + thickness_unit <- match.arg(thickness_unit, c("cell", "natural")) + + if (thickness_unit == "cell") { + CellCellDistance <- estimateCellCellDistance(gobject, + method = cell_distance_estimate_method, + spatial_network_name = spatial_network_name + ) + sectionThickness <- CellCellDistance * slice_thickness + } else if (thickness_unit == "natural") { + sectionThickness <- slice_thickness + } + return(sectionThickness) } #' @title projection_fun @@ -138,28 +157,28 @@ get_sectionThickness <- function(gobject,thickness_unit=c("cell","natural"), #' @param plane_point plane_point #' @param plane_norm plane_norm #' @keywords internal -projection_fun <- function(point_to_project,plane_point,plane_norm){ - - a = plane_norm[1] - b = plane_norm[2] - c = plane_norm[3] - x = point_to_project[1] - y = point_to_project[2] - z = point_to_project[3] - d = plane_point[1] - e = plane_point[2] - f = plane_point[3] - t = (a*d - a*x + b*e - b*y + c*f - c*z)/(a^2+b^2+c^2) - xp = x + t*a - yp = y + t*b - zp = z + t*c - projection = c(xp,yp,zp) - return(projection) +projection_fun <- function(point_to_project, plane_point, plane_norm) { + a <- plane_norm[1] + b <- plane_norm[2] + c <- plane_norm[3] + x <- point_to_project[1] + y <- point_to_project[2] + z <- point_to_project[3] + d <- plane_point[1] + e <- plane_point[2] + f <- plane_point[3] + t <- (a * d - a * x + b * e - b * y + c * f - c * z) / (a^2 + b^2 + c^2) + xp <- x + t * a + yp <- y + t * b + zp <- z + t * c + projection <- c(xp, yp, zp) + return(projection) } #' @title adapt_aspect_ratio #' @name adapt_aspect_ratio -#' @description adapt the aspact ratio after inserting cross section mesh grid lines +#' @description adapt the aspact ratio after inserting cross section mesh grid +#' lines #' @param current_ratio current_ratio #' @param cell_locations cell_locations #' @param sdimx sdimx @@ -167,36 +186,40 @@ projection_fun <- function(point_to_project,plane_point,plane_norm){ #' @param sdimz sdimz #' @param mesh_obj mesh_obj #' @keywords internal -adapt_aspect_ratio <-function(current_ratio,cell_locations, - sdimx = NULL,sdimy = NULL,sdimz = NULL, - mesh_obj=NULL){ - x_range = max(cell_locations[[sdimx]]) - min(cell_locations[[sdimx]]) - y_range = max(cell_locations[[sdimy]]) - min(cell_locations[[sdimy]]) - z_range = max(cell_locations[[sdimz]]) - min(cell_locations[[sdimz]]) - - x_mesh_range = max(mesh_obj$mesh_grid_lines$mesh_grid_lines_X) - min(mesh_obj$mesh_grid_lines$mesh_grid_lines_X) - y_mesh_range = max(mesh_obj$mesh_grid_lines$mesh_grid_lines_Y) - min(mesh_obj$mesh_grid_lines$mesh_grid_lines_Y) - z_mesh_range = max(mesh_obj$mesh_grid_lines$mesh_grid_lines_Z) - min(mesh_obj$mesh_grid_lines$mesh_grid_lines_Z) - - if (x_mesh_range>x_range){ - x_adapt = x_mesh_range/x_range - }else{ - x_adapt = 1 - } - if (y_mesh_range>y_range){ - y_adapt = y_mesh_range/y_range - }else{ - y_adapt = 1 - } - if (z_mesh_range>z_range){ - z_adapt = z_mesh_range/z_range - }else{ - z_adapt = 1 - } - - new_ratio = as.numeric(current_ratio)*c(as.numeric(x_adapt),as.numeric(y_adapt),as.numeric(z_adapt)) - new_ratio = new_ratio/min(new_ratio) - return(new_ratio) +adapt_aspect_ratio <- function(current_ratio, cell_locations, + sdimx = NULL, sdimy = NULL, sdimz = NULL, + mesh_obj = NULL) { + x_range <- max(cell_locations[[sdimx]]) - min(cell_locations[[sdimx]]) + y_range <- max(cell_locations[[sdimy]]) - min(cell_locations[[sdimy]]) + z_range <- max(cell_locations[[sdimz]]) - min(cell_locations[[sdimz]]) + + x_mesh_range <- max(mesh_obj$mesh_grid_lines$mesh_grid_lines_X) - min( + mesh_obj$mesh_grid_lines$mesh_grid_lines_X) + y_mesh_range <- max(mesh_obj$mesh_grid_lines$mesh_grid_lines_Y) - min( + mesh_obj$mesh_grid_lines$mesh_grid_lines_Y) + z_mesh_range <- max(mesh_obj$mesh_grid_lines$mesh_grid_lines_Z) - min( + mesh_obj$mesh_grid_lines$mesh_grid_lines_Z) + + if (x_mesh_range > x_range) { + x_adapt <- x_mesh_range / x_range + } else { + x_adapt <- 1 + } + if (y_mesh_range > y_range) { + y_adapt <- y_mesh_range / y_range + } else { + y_adapt <- 1 + } + if (z_mesh_range > z_range) { + z_adapt <- z_mesh_range / z_range + } else { + z_adapt <- 1 + } + + new_ratio <- as.numeric(current_ratio) * c( + as.numeric(x_adapt), as.numeric(y_adapt), as.numeric(z_adapt)) + new_ratio <- new_ratio / min(new_ratio) + return(new_ratio) } # mesh grid line helper functions #### @@ -207,12 +230,11 @@ adapt_aspect_ratio <-function(current_ratio,cell_locations, #' @param x x #' @param extend_ratio extend_ratio #' @keywords internal -extend_vector <- function(x,extend_ratio){ - - x_center = (max(x)+min(x))/2 - y = (x-x_center)*(extend_ratio+1)+x_center +extend_vector <- function(x, extend_ratio) { + x_center <- (max(x) + min(x)) / 2 + y <- (x - x_center) * (extend_ratio + 1) + x_center - return(y) + return(y) } #' @title find_x_y_ranges @@ -221,21 +243,21 @@ extend_vector <- function(x,extend_ratio){ #' @param data data #' @param extend_ratio extend_ratio #' @keywords internal -find_x_y_ranges <- function(data,extend_ratio){ - - x_extend = extend_vector(data[,1],extend_ratio) - y_extend = extend_vector(data[,2],extend_ratio) - - x_min = min(x_extend) - x_max = max(x_extend) - y_min = min(y_extend) - y_max = max(y_extend) - - out = list("x_min"=x_min, - "x_max"=x_max, - "y_min"=y_min, - "y_max"=y_max - ) +find_x_y_ranges <- function(data, extend_ratio) { + x_extend <- extend_vector(data[, 1], extend_ratio) + y_extend <- extend_vector(data[, 2], extend_ratio) + + x_min <- min(x_extend) + x_max <- max(x_extend) + y_min <- min(y_extend) + y_max <- max(y_extend) + + out <- list( + "x_min" = x_min, + "x_max" = x_max, + "y_min" = y_min, + "y_max" = y_max + ) } #' @title create_2d_mesh_grid_line_obj @@ -247,21 +269,27 @@ find_x_y_ranges <- function(data,extend_ratio){ #' @param y_max y_max #' @param mesh_grid_n mesh_grid_n #' @keywords internal -create_2d_mesh_grid_line_obj <- function(x_min,x_max,y_min,y_max,mesh_grid_n){ - - x_grid = seq(x_min,x_max,length.out = mesh_grid_n) - y_grid = seq(y_min,y_max,length.out = mesh_grid_n) - - mesh_grid_lines_X = cbind(matrix(rep(x_grid,mesh_grid_n),nrow = mesh_grid_n,byrow = T), - matrix(rep(x_grid,mesh_grid_n),nrow = mesh_grid_n,byrow = F)) +create_2d_mesh_grid_line_obj <- function( + x_min, x_max, y_min, y_max, mesh_grid_n) { + x_grid <- seq(x_min, x_max, length.out = mesh_grid_n) + y_grid <- seq(y_min, y_max, length.out = mesh_grid_n) + + mesh_grid_lines_X <- cbind( + matrix(rep(x_grid, mesh_grid_n), nrow = mesh_grid_n, byrow = TRUE), + matrix(rep(x_grid, mesh_grid_n), nrow = mesh_grid_n, byrow = FALSE) + ) - mesh_grid_lines_Y = cbind(matrix(rep(y_grid,mesh_grid_n),nrow = mesh_grid_n,byrow = F), - matrix(rep(y_grid,mesh_grid_n),nrow = mesh_grid_n,byrow = T)) + mesh_grid_lines_Y <- cbind( + matrix(rep(y_grid, mesh_grid_n), nrow = mesh_grid_n, byrow = FALSE), + matrix(rep(y_grid, mesh_grid_n), nrow = mesh_grid_n, byrow = TRUE) + ) - mesh_grid_line_obj_2d = list("mesh_grid_lines_X"=mesh_grid_lines_X, - "mesh_grid_lines_Y"=mesh_grid_lines_Y) - return(mesh_grid_line_obj_2d) + mesh_grid_line_obj_2d <- list( + "mesh_grid_lines_X" = mesh_grid_lines_X, + "mesh_grid_lines_Y" = mesh_grid_lines_Y + ) + return(mesh_grid_line_obj_2d) } #' @title reshape_to_data_point @@ -269,18 +297,20 @@ create_2d_mesh_grid_line_obj <- function(x_min,x_max,y_min,y_max,mesh_grid_n){ #' @description reshape a mesh grid line object to data point matrix #' @param mesh_grid_obj mesh_grid_obj #' @keywords internal -reshape_to_data_point <- function(mesh_grid_obj){ - - if (length(mesh_grid_obj)==3){ - data_points = cbind(as.vector(mesh_grid_obj[[1]]), - as.vector(mesh_grid_obj[[2]]), - as.vector(mesh_grid_obj[[3]])) - }else if (length(mesh_grid_obj)==2){ - data_points = cbind(as.vector(mesh_grid_obj[[1]]), - as.vector(mesh_grid_obj[[2]]) - ) - } - return(data_points) +reshape_to_data_point <- function(mesh_grid_obj) { + if (length(mesh_grid_obj) == 3) { + data_points <- cbind( + as.vector(mesh_grid_obj[[1]]), + as.vector(mesh_grid_obj[[2]]), + as.vector(mesh_grid_obj[[3]]) + ) + } else if (length(mesh_grid_obj) == 2) { + data_points <- cbind( + as.vector(mesh_grid_obj[[1]]), + as.vector(mesh_grid_obj[[2]]) + ) + } + return(data_points) } #' @title reshape_to_mesh_grid_obj @@ -289,25 +319,31 @@ reshape_to_data_point <- function(mesh_grid_obj){ #' @param data_points data_points #' @param mesh_grid_n mesh_grid_n #' @keywords internal -reshape_to_mesh_grid_obj <- function(data_points,mesh_grid_n){ - - if (dim(data_points)[2]==2){ - - mesh_grid_lines_X = matrix(data_points[,1],nrow=mesh_grid_n,byrow=F) - mesh_grid_lines_Y = matrix(data_points[,2],nrow=mesh_grid_n,byrow=F) - - mesh_grid_obj = list("mesh_grid_lines_X"=mesh_grid_lines_X, - "mesh_grid_lines_Y"=mesh_grid_lines_Y) - - }else if (dim(data_points)[2]==3){ - mesh_grid_lines_X = matrix(data_points[,1],nrow=mesh_grid_n,byrow=F) - mesh_grid_lines_Y = matrix(data_points[,2],nrow=mesh_grid_n,byrow=F) - mesh_grid_lines_Z = matrix(data_points[,3],nrow=mesh_grid_n,byrow=F) - mesh_grid_obj = list("mesh_grid_lines_X"=mesh_grid_lines_X, - "mesh_grid_lines_Y"=mesh_grid_lines_Y, - "mesh_grid_lines_Z"=mesh_grid_lines_Z) - } - return(mesh_grid_obj) +reshape_to_mesh_grid_obj <- function(data_points, mesh_grid_n) { + if (dim(data_points)[2] == 2) { + mesh_grid_lines_X <- matrix( + data_points[, 1], nrow = mesh_grid_n, byrow = FALSE) + mesh_grid_lines_Y <- matrix( + data_points[, 2], nrow = mesh_grid_n, byrow = FALSE) + + mesh_grid_obj <- list( + "mesh_grid_lines_X" = mesh_grid_lines_X, + "mesh_grid_lines_Y" = mesh_grid_lines_Y + ) + } else if (dim(data_points)[2] == 3) { + mesh_grid_lines_X <- matrix( + data_points[, 1], nrow = mesh_grid_n, byrow = FALSE) + mesh_grid_lines_Y <- matrix( + data_points[, 2], nrow = mesh_grid_n, byrow = FALSE) + mesh_grid_lines_Z <- matrix( + data_points[, 3], nrow = mesh_grid_n, byrow = FALSE) + mesh_grid_obj <- list( + "mesh_grid_lines_X" = mesh_grid_lines_X, + "mesh_grid_lines_Y" = mesh_grid_lines_Y, + "mesh_grid_lines_Z" = mesh_grid_lines_Z + ) + } + return(mesh_grid_obj) } @@ -319,14 +355,19 @@ reshape_to_mesh_grid_obj <- function(data_points,mesh_grid_n){ #' @param center_vec center_vec #' @param mesh_grid_n mesh_grid_n #' @keywords internal -transform_2d_mesh_to_3d_mesh <- function(mesh_line_obj_2d,pca_out,center_vec,mesh_grid_n){ - - data_point_2d = reshape_to_data_point(mesh_line_obj_2d) - center_mat = matrix(rep(center_vec,dim(data_point_2d)[1]),nrow=dim(data_point_2d)[1],byrow=T) - data_point_3d = cbind(data_point_2d,rep(0,dim(data_point_2d)[1])) %*% t((pca_out$rotation))+center_mat - mesh_grid_line_obj_3d = reshape_to_mesh_grid_obj(data_point_3d,mesh_grid_n) - - return(mesh_grid_line_obj_3d) +transform_2d_mesh_to_3d_mesh <- function( + mesh_line_obj_2d, pca_out, center_vec, mesh_grid_n) { + data_point_2d <- reshape_to_data_point(mesh_line_obj_2d) + center_mat <- matrix( + rep(center_vec, dim(data_point_2d)[1]), + nrow = dim(data_point_2d)[1], byrow = TRUE) + data_point_3d <- cbind( + data_point_2d, + rep(0, dim(data_point_2d)[1])) %*% t((pca_out$rotation)) + center_mat + mesh_grid_line_obj_3d <- reshape_to_mesh_grid_obj( + data_point_3d, mesh_grid_n) + + return(mesh_grid_line_obj_3d) } #' @title get_cross_section_coordinates @@ -334,13 +375,14 @@ transform_2d_mesh_to_3d_mesh <- function(mesh_line_obj_2d,pca_out,center_vec,mes #' @description get local coordinates within cross section plane #' @param cell_subset_projection_locations cell_subset_projection_locations #' @keywords internal -get_cross_section_coordinates <- function(cell_subset_projection_locations){ - - cell_subset_projection_PCA = stats::prcomp(cell_subset_projection_locations) +get_cross_section_coordinates <- function(cell_subset_projection_locations) { + cell_subset_projection_PCA <- stats::prcomp( + cell_subset_projection_locations) - cell_subset_projection_coords = cell_subset_projection_PCA$x[,c("PC1","PC2")] + cell_subset_projection_coords <- cell_subset_projection_PCA$x[ + , c("PC1", "PC2")] - return(cell_subset_projection_coords) + return(cell_subset_projection_coords) } #' @title create_mesh_grid_lines @@ -350,25 +392,32 @@ get_cross_section_coordinates <- function(cell_subset_projection_locations){ #' @param extend_ratio extend_ratio #' @param mesh_grid_n mesh_grid_n #' @keywords internal -create_mesh_grid_lines <- function(cell_subset_projection_locations,extend_ratio,mesh_grid_n){ - - cell_subset_projection_PCA = stats::prcomp(cell_subset_projection_locations) - - cell_subset_projection_coords = cell_subset_projection_PCA$x[,c("PC1","PC2")] - - x_y_ranges = find_x_y_ranges(cell_subset_projection_coords,extend_ratio) - - mesh_line_obj_2d = create_2d_mesh_grid_line_obj(x_y_ranges$x_min, - x_y_ranges$x_max, - x_y_ranges$y_min, - x_y_ranges$y_max, - mesh_grid_n) - center_vec = apply(cell_subset_projection_locations,2,function(x) mean(x)) - mesh_grid_line_obj_3d = transform_2d_mesh_to_3d_mesh(mesh_line_obj_2d, - cell_subset_projection_PCA, - center_vec, - mesh_grid_n) - return(mesh_grid_line_obj_3d) +create_mesh_grid_lines <- function( + cell_subset_projection_locations, extend_ratio, mesh_grid_n) { + cell_subset_projection_PCA <- stats::prcomp( + cell_subset_projection_locations) + + cell_subset_projection_coords <- cell_subset_projection_PCA$x[ + , c("PC1", "PC2")] + + x_y_ranges <- find_x_y_ranges(cell_subset_projection_coords, extend_ratio) + + mesh_line_obj_2d <- create_2d_mesh_grid_line_obj( + x_y_ranges$x_min, + x_y_ranges$x_max, + x_y_ranges$y_min, + x_y_ranges$y_max, + mesh_grid_n + ) + center_vec <- apply( + cell_subset_projection_locations, 2, function(x) mean(x)) + mesh_grid_line_obj_3d <- transform_2d_mesh_to_3d_mesh( + mesh_line_obj_2d, + cell_subset_projection_PCA, + center_vec, + mesh_grid_n + ) + return(mesh_grid_line_obj_3d) } @@ -379,172 +428,207 @@ create_mesh_grid_lines <- function(cell_subset_projection_locations,extend_ratio #' @param gobject giotto object #' @param spat_loc_name name of spatial locations #' @param name name of cress section object. (default = cross_sectino) -#' @param spatial_network_name name of spatial network object. (default = Delaunay_network) -#' @param thickness_unit unit of the virtual section thickness. If "cell", average size of the observed cells is used as length unit. If "natural", the unit of cell location coordinates is used.(default = cell) +#' @param spatial_network_name name of spatial network object. +#' (default = Delaunay_network) +#' @param thickness_unit unit of the virtual section thickness. If "cell", +#' average size of the observed cells is used as length unit. If "natural", +#' the unit of cell location coordinates is used.(default = cell) #' @param slice_thickness thickness of slice. default = 2 -#' @param cell_distance_estimate_method method to estimate average distance between neighobring cells. (default = mean) -#' @param extend_ratio deciding the span of the cross section meshgrid, as a ratio of extension compared to the borders of the vitural tissue section. (default = 0.2) +#' @param cell_distance_estimate_method method to estimate average distance +#' between neighobring cells. (default = mean) +#' @param extend_ratio deciding the span of the cross section meshgrid, as a +#' ratio of extension compared to the borders of the vitural tissue section. +#' (default = 0.2) #' @param method method to define the cross section plane. -#' If equation, the plane is defined by a four element numerical vector (equation) in the form of c(A,B,C,D), corresponding to a plane with equation Ax+By+Cz=D. -#' If 3 points, the plane is define by the coordinates of 3 points, as given by point1, point2, and point3. -#' If point and norm vector, the plane is defined by the coordinates of one point (point1) in the plane and the coordinates of one norm vector (normVector) to the plane. -#' If point and two plane vector, the plane is defined by the coordinates of one point (point1) in the plane and the coordinates of two vectors (planeVector1, planeVector2) in the plane. +#' If equation, the plane is defined by a four element numerical vector +#' (equation) in the form of c(A,B,C,D), corresponding to a plane with +#' equation Ax+By+Cz=D. +#' If 3 points, the plane is define by the coordinates of 3 points, as given by +#' point1, point2, and point3. +#' If point and norm vector, the plane is defined by the coordinates of one +#' point (point1) in the plane and the coordinates of one norm vector +#' (normVector) to the plane. +#' If point and two plane vector, the plane is defined by the coordinates of +#' one point (point1) in the plane and the coordinates of two vectors +#' (planeVector1, planeVector2) in the plane. #' (default = equation) -#' @param equation equation required by method "equation".equations needs to be a numerical vector of length 4, in the form of c(A,B,C,D), which defines plane Ax+By+Cz=D. -#' @param point1 coordinates of the first point required by method "3 points","point and norm vector", and "point and two plane vectors". +#' @param equation equation required by method "equation".equations needs to be +#' a numerical vector of length 4, in the form of c(A,B,C,D), which defines +#' plane Ax+By+Cz=D. +#' @param point1 coordinates of the first point required by method +#' "3 points","point and norm vector", and "point and two plane vectors". #' @param point2 coordinates of the second point required by method "3 points" #' @param point3 coordinates of the third point required by method "3 points" -#' @param normVector coordinates of the norm vector required by method "point and norm vector" -#' @param planeVector1 coordinates of the first plane vector required by method "point and two plane vectors" -#' @param planeVector2 coordinates of the second plane vector required by method "point and two plane vectors" -#' @param mesh_grid_n numer of meshgrid lines to generate along both directions for the cross section plane. +#' @param normVector coordinates of the norm vector required by method +#' "point and norm vector" +#' @param planeVector1 coordinates of the first plane vector required by +#' method "point and two plane vectors" +#' @param planeVector2 coordinates of the second plane vector required by +#' method "point and two plane vectors" +#' @param mesh_grid_n numer of meshgrid lines to generate along both directions +#' for the cross section plane. #' @param return_gobject boolean: return giotto object (default = TRUE) #' @return giotto object with updated spatial network slot -#' @details Creates a virtual 2D cross section object for a given spatial network object. The users need to provide the definition of the cross section plane (see method). +#' @details Creates a virtual 2D cross section object for a given spatial +#' network object. The users need to provide the definition of the cross +#' section plane (see method). #' @export createCrossSection <- function(gobject, - spat_loc_name = 'raw', - name="cross_section", - spatial_network_name = "Delaunay_network", - thickness_unit = c("cell","natural"), - slice_thickness = 2, - cell_distance_estimate_method = "mean", - extend_ratio = 0.2, - method=c("equation","3 points","point and norm vector","point and two plane vectors"), - equation=NULL, - point1=NULL,point2=NULL,point3=NULL, - normVector=NULL, - planeVector1=NULL,planeVector2=NULL, - mesh_grid_n = 20, - return_gobject = TRUE -){ - - # read spatial locations - spatial_locations = get_spatial_locations(gobject, spat_loc_name = spat_loc_name) - cell_IDs = spatial_locations[, "cell_ID"] - cell_IDs = cell_IDs$cell_ID - - colnames_to_extract = c("sdimx", "sdimy", "sdimz") - spatial_locations = spatial_locations[, colnames_to_extract] - - spatial_locations = spatial_locations@coordinates - - spatial_locations = as.matrix(spatial_locations) - rownames(spatial_locations) = cell_IDs - cell_ID_vec = c(1:nrow(spatial_locations)) - names(cell_ID_vec) = rownames(spatial_locations) - - # generate section plane equation - - method = match.arg(method, c("equation","3 points","point and norm vector","point and two plane vectors")) - - if (method == "equation"){ - if (is.null(equation)){ - print("equation was not provided.") - }else{ - plane_equation = equation - plane_equation[4] = -equation[4] - } - }else if (method == "point and norm vector"){ - if (is.null(point1)|is.null(normVector)){ - print("either point or norm vector was not provided.") - }else{ - plane_equation = c() - plane_equation[1:3] = normVector - plane_equation[4] = -point1 %*% normVector - } - }else if (method == "point and two plane vectors"){ - if(is.null(point1)|is.null(planeVector1)|is.null(planeVector2)){ - print("either point or any of the two plane vectors was not provided.") - }else{ - normVector = crossprod(planeVector1,planeVector2) - plane_equation[1:3] = normVector - plane_equation[4] = -point1 %*% normVector - } - }else if (method == "3 points"){ - if (is.null(point1)|is.null(point2)|is.null(point3)){ - print("not all three points were provided.") - }else{ - planeVector1 = point2-point1; - planeVector2 = point3-point1; - normVector = crossprod(planeVector1,planeVector2) - plane_equation[1:3] = normVector - plane_equation[4] = -point1 %*% normVector - } - } - names(plane_equation)=c("A","B","C","D") - - # determine section thickness - thickness_unit = match.arg(thickness_unit, c("cell", "natural")) - sectionThickness = get_sectionThickness(gobject,thickness_unit=thickness_unit, - slice_thickness = slice_thickness, - spatial_network_name=spatial_network_name, - cell_distance_estimate_method = cell_distance_estimate_method, - plane_equation=plane_equation) - - max_distance_to_section_plane = sectionThickness/2 - - # calculate distances to cross section - spatial_locations_mat = cbind(spatial_locations,as.matrix(rep(1,dim(spatial_locations)[1]))) - norm_vec <- function(x) sqrt(sum(x^2)) - distance_to_plane_vector = abs(spatial_locations_mat %*% as.matrix(plane_equation)/norm_vec(plane_equation[1:3])) - - # select cells within section ### - cell_subset = distance_to_plane_vector<=max_distance_to_section_plane - - # project the selected cells onto the section plane ### - cell_subset_spatial_locations = spatial_locations[cell_subset,] - - ## find a point on the section plane ## - if (plane_equation["A"]!=0){ - plane_point = c(-plane_equation["D"]/plane_equation["A"],0,0) - }else if (plane_equation["B"]!=0){ - plane_point = c(0,-plane_equation["D"]/plane_equation["B"],0) - }else if (plane_equation["C"]!=0){ - plane_point = c(0,0,-plane_equation["D"]/plane_equation["C"]) - } - ## find the projection Xp,Yp,Zp coordinates ## - cell_subset_projection_locations = t(apply(cell_subset_spatial_locations,1,function(x) projection_fun(x,plane_point = plane_point, plane_norm = plane_equation[1:3]))) - - # get the local coordinates of selected cells on the section plane - cell_subset_projection_PCA = stats::prcomp(cell_subset_projection_locations) - cell_subset_projection_coords = get_cross_section_coordinates(cell_subset_projection_locations) - - # create mesh grid lines for the cross section ### - mesh_grid_lines = create_mesh_grid_lines(cell_subset_projection_locations,extend_ratio,mesh_grid_n) - mesh_obj = list("mesh_grid_lines" = mesh_grid_lines) - - ### save and update the spatial object ### - - crossSection_obj <- create_crossSection_object(method=method, - thickness_unit=thickness_unit, - slice_thickness=slice_thickness, - cell_distance_estimate_method=cell_distance_estimate_method, - extend_ratio=extend_ratio, - plane_equation=plane_equation,mesh_grid_n=mesh_grid_n, - mesh_obj=mesh_obj,cell_subset=cell_subset, - cell_subset_spatial_locations=cell_subset_spatial_locations, - cell_subset_projection_locations=cell_subset_projection_locations, - cell_subset_projection_PCA=cell_subset_projection_PCA, - cell_subset_projection_coords=cell_subset_projection_coords) - - - if (return_gobject == TRUE) { - - cs_names = names(gobject@spatial_network[[spatial_network_name]]$crossSectionObjects) - if (name %in% cs_names) { - cat("\n ", name, " has already been used, will be overwritten \n") + spat_loc_name = "raw", + name = "cross_section", + spatial_network_name = "Delaunay_network", + thickness_unit = c("cell", "natural"), + slice_thickness = 2, + cell_distance_estimate_method = "mean", + extend_ratio = 0.2, + method = c("equation", "3 points", "point and norm vector", + "point and two plane vectors"), + equation = NULL, + point1 = NULL, point2 = NULL, point3 = NULL, + normVector = NULL, + planeVector1 = NULL, planeVector2 = NULL, + mesh_grid_n = 20, + return_gobject = TRUE) { + # read spatial locations + spatial_locations <- getSpatialLocations(gobject, + name = spat_loc_name) + cell_IDs <- spatial_locations[, "cell_ID"] + cell_IDs <- cell_IDs$cell_ID + + colnames_to_extract <- c("sdimx", "sdimy", "sdimz") + spatial_locations <- spatial_locations[, colnames_to_extract] + + spatial_locations <- spatial_locations@coordinates + + spatial_locations <- as.matrix(spatial_locations) + rownames(spatial_locations) <- cell_IDs + cell_ID_vec <- c(1:nrow(spatial_locations)) + names(cell_ID_vec) <- rownames(spatial_locations) + + # generate section plane equation + + method <- match.arg( + method, + c("equation", "3 points", "point and norm vector", + "point and two plane vectors")) + + if (method == "equation") { + if (is.null(equation)) { + message("equation was not provided.") + } else { + plane_equation <- equation + plane_equation[4] <- -equation[4] + } + } else if (method == "point and norm vector") { + if (is.null(point1) | is.null(normVector)) { + message("either point or norm vector was not provided.") + } else { + plane_equation <- c() + plane_equation[1:3] <- normVector + plane_equation[4] <- -point1 %*% normVector + } + } else if (method == "point and two plane vectors") { + if (is.null(point1) | is.null(planeVector1) | is.null(planeVector2)) { + message("either point or any of the two plane vectors was not + provided.") + } else { + normVector <- crossprod(planeVector1, planeVector2) + plane_equation[1:3] <- normVector + plane_equation[4] <- -point1 %*% normVector + } + } else if (method == "3 points") { + if (is.null(point1) | is.null(point2) | is.null(point3)) { + message("not all three points were provided.") + } else { + planeVector1 <- point2 - point1 + planeVector2 <- point3 - point1 + normVector <- crossprod(planeVector1, planeVector2) + plane_equation[1:3] <- normVector + plane_equation[4] <- -point1 %*% normVector + } } - gobject@spatial_network[[spatial_network_name]]$crossSectionObjects[[name]] = crossSection_obj + names(plane_equation) <- c("A", "B", "C", "D") + + # determine section thickness + thickness_unit <- match.arg(thickness_unit, c("cell", "natural")) + sectionThickness <- get_sectionThickness(gobject, + thickness_unit = thickness_unit, + slice_thickness = slice_thickness, + spatial_network_name = spatial_network_name, + cell_distance_estimate_method = cell_distance_estimate_method, + plane_equation = plane_equation + ) - return(gobject) + max_distance_to_section_plane <- sectionThickness / 2 + + # calculate distances to cross section + spatial_locations_mat <- cbind( + spatial_locations, as.matrix(rep(1, dim(spatial_locations)[1]))) + norm_vec <- function(x) sqrt(sum(x^2)) + distance_to_plane_vector <- abs(spatial_locations_mat %*% as.matrix( + plane_equation) / norm_vec(plane_equation[1:3])) + + # select cells within section ### + cell_subset <- distance_to_plane_vector <= max_distance_to_section_plane + + # project the selected cells onto the section plane ### + cell_subset_spatial_locations <- spatial_locations[cell_subset, ] + + ## find a point on the section plane ## + if (plane_equation["A"] != 0) { + plane_point <- c(-plane_equation["D"] / plane_equation["A"], 0, 0) + } else if (plane_equation["B"] != 0) { + plane_point <- c(0, -plane_equation["D"] / plane_equation["B"], 0) + } else if (plane_equation["C"] != 0) { + plane_point <- c(0, 0, -plane_equation["D"] / plane_equation["C"]) + } + ## find the projection Xp,Yp,Zp coordinates ## + cell_subset_projection_locations <- t(apply( + cell_subset_spatial_locations, 1, + function(x) projection_fun(x, plane_point = plane_point, + plane_norm = plane_equation[1:3]))) + + # get the local coordinates of selected cells on the section plane + cell_subset_projection_PCA <- stats::prcomp( + cell_subset_projection_locations) + cell_subset_projection_coords <- get_cross_section_coordinates( + cell_subset_projection_locations) + + # create mesh grid lines for the cross section ### + mesh_grid_lines <- create_mesh_grid_lines( + cell_subset_projection_locations, extend_ratio, mesh_grid_n) + mesh_obj <- list("mesh_grid_lines" = mesh_grid_lines) + + ### save and update the spatial object ### + + crossSection_obj <- create_crossSection_object( + method = method, + thickness_unit = thickness_unit, + slice_thickness = slice_thickness, + cell_distance_estimate_method = cell_distance_estimate_method, + extend_ratio = extend_ratio, + plane_equation = plane_equation, mesh_grid_n = mesh_grid_n, + mesh_obj = mesh_obj, cell_subset = cell_subset, + cell_subset_spatial_locations = cell_subset_spatial_locations, + cell_subset_projection_locations = cell_subset_projection_locations, + cell_subset_projection_PCA = cell_subset_projection_PCA, + cell_subset_projection_coords = cell_subset_projection_coords + ) - } - else { - return(crossSection_obj) - } + if (return_gobject == TRUE) { + cs_names <- names(gobject@spatial_network[[ + spatial_network_name]]$crossSectionObjects) + if (name %in% cs_names) { + cat(name, " has already been used, will be overwritten") + } + gobject@spatial_network[[spatial_network_name]]$crossSectionObjects[[ + name]] <- crossSection_obj + return(gobject) + } else { + return(crossSection_obj) + } } @@ -553,13 +637,15 @@ createCrossSection <- function(gobject, #### #' @title crossSectionGenePlot #' @name crossSectionGenePlot -#' @description Visualize cells and gene expression in a virtual cross section according to spatial coordinates +#' @description Visualize cells and gene expression in a virtual cross section +#' according to spatial coordinates #' @param gobject giotto object #' @param spat_loc_name name of spatial locations #' @param crossSection_obj crossSection object #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @param ... parameters for spatFeatPlot2D #' @return ggplot #' @details Description of parameters. @@ -567,203 +653,209 @@ createCrossSection <- function(gobject, #' @seealso [GiottoVisuals::spatGenePlot3D] and [GiottoVisuals::spatFeatPlot2D] #' @export crossSectionGenePlot <- function( - gobject=NULL, - spat_loc_name = 'raw', - crossSection_obj=NULL, - name=NULL, - spatial_network_name = "Delaunay_network", - default_save_name = "crossSectionGenePlot", - ... -) { - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - } else { - crossSection_obj = read_crossSection( - gobject, - name=name, - spatial_network_name = spatial_network_name + gobject = NULL, + spat_loc_name = "raw", + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + default_save_name = "crossSectionGenePlot", + ...) { + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection( + gobject, + name = name, + spatial_network_name = spatial_network_name + ) + } + + cell_subset <- crossSection_obj$cell_subset + cell_subset_projection_coords <- crossSection_obj$cell_subset_projection_coords + # modify gobject based on crossSection object + subset_cell_IDs <- gobject@cell_metadata$cell_ID[cell_subset] + temp_gobject <- subsetGiotto(gobject = gobject, cell_ids = subset_cell_IDs) + temp_gobject@spatial_locs[[spat_loc_name]]$sdimx <- cell_subset_projection_coords[, 1] + temp_gobject@spatial_locs[[spat_loc_name]]$sdimy <- cell_subset_projection_coords[, 2] + temp_gobject@spatial_locs[[spat_loc_name]]$sdimz <- rep(0, dim(cell_subset_projection_coords)[1]) + # call spatFeatPlot2D to generate the plots + GiottoVisuals::spatFeatPlot2D( + gobject = temp_gobject, + spatial_network_name = spatial_network_name, + default_save_name = default_save_name, + ... ) - } - - cell_subset = crossSection_obj$cell_subset - cell_subset_projection_coords = crossSection_obj$cell_subset_projection_coords - # modify gobject based on crossSection object - subset_cell_IDs = gobject@cell_metadata$cell_ID[cell_subset] - temp_gobject = subsetGiotto(gobject = gobject, cell_ids = subset_cell_IDs) - temp_gobject@spatial_locs[[spat_loc_name]]$sdimx=cell_subset_projection_coords[,1] - temp_gobject@spatial_locs[[spat_loc_name]]$sdimy=cell_subset_projection_coords[,2] - temp_gobject@spatial_locs[[spat_loc_name]]$sdimz=rep(0,dim(cell_subset_projection_coords)[1]) - # call spatFeatPlot2D to generate the plots - GiottoVisuals::spatFeatPlot2D( - gobject = temp_gobject, - spatial_network_name = spatial_network_name, - default_save_name = default_save_name, - ... - ) } #### #' @title crossSectionPlot #' @name crossSectionPlot -#' @description Visualize cells in a virtual cross section according to spatial coordinates +#' @description Visualize cells in a virtual cross section according to +#' spatial coordinates #' @param gobject giotto object #' @param spat_loc_name name of spatial locations #' @param feat_type feature type -#' @param crossSection_obj cross section object as alternative input. default = NULL. +#' @param crossSection_obj cross section object as alternative input. +#' default = NULL. #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @param ... parameters for spatPlot2D #' @return ggplot #' @details Description of parameters. #' @export #' @seealso \code{\link{crossSectionPlot}} -crossSectionPlot <-function(gobject, - spat_loc_name = 'raw', - feat_type = NULL, - crossSection_obj = NULL, - name = NULL, - spatial_network_name = "Delaunay_network", - default_save_name = "crossSectionPlot", - ...) { - - - # specify feat_type - if(is.null(feat_type)) { - feat_type = gobject@expression_feat[[1]] - } - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - }else{ - crossSection_obj = read_crossSection(gobject, - name = name, - spatial_network_name = spatial_network_name) - } - - - cell_subset = crossSection_obj$cell_subset - cell_subset_projection_coords = crossSection_obj$cell_subset_projection_coords - - # modify gobject based on crossSection object - subset_cell_IDs = gobject@cell_metadata[[feat_type]]$cell_ID[cell_subset] - temp_gobject = subsetGiotto(gobject, - feat_type = feat_type, - cell_ids = subset_cell_IDs) - temp_gobject@spatial_locs[[spat_loc_name]]$sdimx=cell_subset_projection_coords[,1] - temp_gobject@spatial_locs[[spat_loc_name]]$sdimy=cell_subset_projection_coords[,2] - temp_gobject@spatial_locs[[spat_loc_name]]$sdimz=rep(0,dim(cell_subset_projection_coords)[1]) - - # call spatFeatPlot2D to generate the plots - spatPlot2D(gobject = temp_gobject, - feat_type = feat_type, - spatial_network_name = spatial_network_name, - default_save_name = default_save_name, - ...) +crossSectionPlot <- function(gobject, + spat_loc_name = "raw", + feat_type = NULL, + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + default_save_name = "crossSectionPlot", + ...) { + # specify feat_type + if (is.null(feat_type)) { + feat_type <- gobject@expression_feat[[1]] + } + + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection(gobject, + name = name, + spatial_network_name = spatial_network_name + ) + } + cell_subset <- crossSection_obj$cell_subset + cell_subset_projection_coords <- crossSection_obj$cell_subset_projection_coords + + # modify gobject based on crossSection object + subset_cell_IDs <- gobject@cell_metadata[[feat_type]]$cell_ID[cell_subset] + temp_gobject <- subsetGiotto(gobject, + feat_type = feat_type, + cell_ids = subset_cell_IDs + ) + temp_gobject@spatial_locs[[spat_loc_name]]$sdimx <- cell_subset_projection_coords[, 1] + temp_gobject@spatial_locs[[spat_loc_name]]$sdimy <- cell_subset_projection_coords[, 2] + temp_gobject@spatial_locs[[spat_loc_name]]$sdimz <- rep(0, dim(cell_subset_projection_coords)[1]) + + # call spatFeatPlot2D to generate the plots + spatPlot2D( + gobject = temp_gobject, + feat_type = feat_type, + spatial_network_name = spatial_network_name, + default_save_name = default_save_name, + ... + ) } #### #' @title crossSectionGenePlot3D #' @name crossSectionGenePlot3D -#' @description Visualize cells and gene expression in a virtual cross section according to spatial coordinates +#' @description Visualize cells and gene expression in a virtual cross section +#' according to spatial coordinates #' @param gobject giotto object #' @param crossSection_obj cross section object as alternative input. default = NULL. #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use -#' @param other_cell_color color of cells outside the cross section. default = transparent. -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param other_cell_color color of cells outside the cross section. +#' default = transparent. +#' @param default_save_name default save name for saving, don't change, change +#' save_name in save_param #' @param ... parameters for spatGenePlot3D #' @return ggplot #' @details Description of parameters. #' @export -crossSectionGenePlot3D <-function(gobject, - crossSection_obj = NULL, - name=NULL, - spatial_network_name = "Delaunay_network", - other_cell_color = alpha("lightgrey", 0), - default_save_name = "crossSectionGenePlot3D",...){ - - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - }else{ - crossSection_obj = read_crossSection(gobject,name=name,spatial_network_name = spatial_network_name) - } - - - cell_subset = crossSection_obj$cell_subset - cell_subset_projection_coords = crossSection_obj$cell_subset_projection_coords - # modify gobject based on crossSection object - subset_cell_IDs = gobject@cell_metadata$cell_ID[cell_subset] - # call spatGenePlot3D to generate the plots - spatGenePlot3D(gobject, - select_cells = subset_cell_IDs, - other_cell_color = other_cell_color, - default_save_name = default_save_name,...) +crossSectionGenePlot3D <- function(gobject, + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + other_cell_color = alpha("lightgrey", 0), + default_save_name = "crossSectionGenePlot3D", ...) { + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection( + gobject, name = name, spatial_network_name = spatial_network_name) + } + + + cell_subset <- crossSection_obj$cell_subset + cell_subset_projection_coords <- crossSection_obj$cell_subset_projection_coords + # modify gobject based on crossSection object + subset_cell_IDs <- gobject@cell_metadata$cell_ID[cell_subset] + # call spatGenePlot3D to generate the plots + spatGenePlot3D(gobject, + select_cells = subset_cell_IDs, + other_cell_color = other_cell_color, + default_save_name = default_save_name, ... + ) } #### #' @title crossSectionPlot3D #' @name crossSectionPlot3D -#' @description Visualize cells in a virtual cross section according to spatial coordinates +#' @description Visualize cells in a virtual cross section according to spatial +#' coordinates #' @param gobject giotto object -#' @param crossSection_obj cross section object as alternative input. default = NULL. +#' @param crossSection_obj cross section object as alternative input. +#' default = NULL. #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use #' @param show_other_cells display not selected cells -#' @param other_cell_color color of cells outside the cross section. default = transparent. -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param other_cell_color color of cells outside the cross section. +#' default = transparent. +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @param ... parameters for spatPlot3D #' @return ggplot #' @details Description of parameters. #' @export -crossSectionPlot3D <-function(gobject, - crossSection_obj = NULL, - name=NULL, - spatial_network_name = "Delaunay_network", - show_other_cells = T, - other_cell_color = alpha("lightgrey", 0), - default_save_name = "crossSection3D",...){ - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - }else{ - crossSection_obj = read_crossSection(gobject,name=name,spatial_network_name = spatial_network_name) - } - - cell_subset = crossSection_obj$cell_subset - cell_subset_projection_coords = crossSection_obj$cell_subset_projection_coords - # modify gobject based on crossSection object - subset_cell_IDs = gobject@cell_metadata$cell_ID[cell_subset] - # temp_gobject = subsetGiotto(gobject = gobject, cell_ids = subset_cell_IDs) - # temp_gobject@spatial_locs$sdimx=cell_subset_projection_coords[,1] - # temp_gobject@spatial_locs$sdimy=cell_subset_projection_coords[,2] - # temp_gobject@spatial_locs$sdimz=rep(0,dim(cell_subset_projection_coords)[1]) - # - # call spatPlot3D to generate the plots - spatPlot3D(gobject=gobject, - ## - select_cells = subset_cell_IDs, - ## - show_other_cells = show_other_cells, - other_cell_color = other_cell_color, - default_save_name = default_save_name,...) +crossSectionPlot3D <- function(gobject, + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + show_other_cells = TRUE, + other_cell_color = alpha("lightgrey", 0), + default_save_name = "crossSection3D", ...) { + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection( + gobject, name = name, spatial_network_name = spatial_network_name) + } + + cell_subset <- crossSection_obj$cell_subset + cell_subset_projection_coords <- crossSection_obj$cell_subset_projection_coords + # modify gobject based on crossSection object + subset_cell_IDs <- gobject@cell_metadata$cell_ID[cell_subset] + + # call spatPlot3D to generate the plots + spatPlot3D( + gobject = gobject, + select_cells = subset_cell_IDs, + show_other_cells = show_other_cells, + other_cell_color = other_cell_color, + default_save_name = default_save_name, ... + ) } #### #' @title insertCrossSectionSpatPlot3D #' @name insertCrossSectionSpatPlot3D -#' @description Visualize the meshgrid lines of cross section together with cells +#' @description Visualize the meshgrid lines of cross section together with +#' cells #' @param gobject giotto object #' @param spat_loc_name name of spatial locations -#' @param crossSection_obj cross section object as alternative input. default = NULL. +#' @param crossSection_obj cross section object as alternative input. +#' default = NULL. #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use #' @param mesh_grid_color color for the meshgrid lines @@ -775,79 +867,92 @@ crossSectionPlot3D <-function(gobject, #' @param show_other_cells display not selected cells #' @param axis_scale axis_scale #' @param custom_ratio custom_ratio -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @param ... parameters for spatPlot3D #' @return ggplot #' @details Description of parameters. #' @export insertCrossSectionSpatPlot3D <- function(gobject, - spat_loc_name = 'raw', - crossSection_obj=NULL, - name=NULL, - spatial_network_name = "Delaunay_network", - mesh_grid_color = "#1f77b4", - mesh_grid_width = 3, - mesh_grid_style = "dot", - sdimx = "sdimx", sdimy = "sdimy", sdimz = "sdimz", - show_other_cells = F, - axis_scale = c("cube", "real", "custom"), - custom_ratio = NULL, - default_save_name = "spat3D_with_cross_section",...){ - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - }else{ - crossSection_obj = read_crossSection(gobject,name=name,spatial_network_name = spatial_network_name) - } - - - - pl = spatPlot3D(gobject, - sdimx = sdimx, - sdimy = sdimy, - sdimz = sdimz, - show_other_cells = show_other_cells, - show_plot = FALSE, - return_plot = TRUE, - save_plot = FALSE, - default_save_name = default_save_name,...) - - for (i in 1:dim(crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X)[2]){ - - pl = pl %>% plotly::add_trace(x = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X[,i], - y = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Y[,i], - z = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Z[,i], - mode = 'lines',type = 'scatter3d', - line = list(color = mesh_grid_color, width = mesh_grid_width,dash = mesh_grid_style)) - } - - current_ratio = plotly_axis_scale_3D(gobject@spatial_locs[[spat_loc_name]], - sdimx = sdimx,sdimy = sdimy,sdimz = sdimz, - mode = axis_scale,custom_ratio = custom_ratio) - - new_ratio = adapt_aspect_ratio(current_ratio,gobject@spatial_locs[[spat_loc_name]], - sdimx = sdimx,sdimy = sdimy,sdimz = sdimz, - mesh_obj=crossSection_obj$mesh_obj) - - pl = pl %>% plotly::layout(showlegend = FALSE, - scene = list( - aspectmode='manual', - aspectratio = list(x=new_ratio[[1]], - y=new_ratio[[2]], - z=new_ratio[[3]]))) - - return(pl) + spat_loc_name = "raw", + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + mesh_grid_color = "#1f77b4", + mesh_grid_width = 3, + mesh_grid_style = "dot", + sdimx = "sdimx", sdimy = "sdimy", sdimz = "sdimz", + show_other_cells = FALSE, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + default_save_name = "spat3D_with_cross_section", ...) { + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection( + gobject, name = name, spatial_network_name = spatial_network_name) + } + + + pl <- spatPlot3D(gobject, + sdimx = sdimx, + sdimy = sdimy, + sdimz = sdimz, + show_other_cells = show_other_cells, + show_plot = FALSE, + return_plot = TRUE, + save_plot = FALSE, + default_save_name = default_save_name, ... + ) + + for (i in seq_len(dim( + crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X)[2])) { + pl <- pl %>% plotly::add_trace( + x = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X[, i], + y = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Y[, i], + z = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Z[, i], + mode = "lines", type = "scatter3d", + line = list(color = mesh_grid_color, + width = mesh_grid_width, dash = mesh_grid_style) + ) + } + + current_ratio <- plotly_axis_scale_3D(gobject@spatial_locs[[spat_loc_name]], + sdimx = sdimx, sdimy = sdimy, sdimz = sdimz, + mode = axis_scale, custom_ratio = custom_ratio + ) + + new_ratio <- adapt_aspect_ratio( + current_ratio, gobject@spatial_locs[[spat_loc_name]], + sdimx = sdimx, sdimy = sdimy, sdimz = sdimz, + mesh_obj = crossSection_obj$mesh_obj + ) + pl <- pl %>% plotly::layout( + showlegend = FALSE, + scene = list( + aspectmode = "manual", + aspectratio = list( + x = new_ratio[[1]], + y = new_ratio[[2]], + z = new_ratio[[3]] + ) + ) + ) + + return(pl) } #### #' @title insertCrossSectionGenePlot3D #' @name insertCrossSectionGenePlot3D -#' @description Visualize cells and gene expression in a virtual cross section according to spatial coordinates +#' @description Visualize cells and gene expression in a virtual cross section +#' according to spatial coordinates #' @param gobject giotto object #' @param spat_loc_name name of spatial locations -#' @param crossSection_obj cross section object as alternative input. default = NULL. +#' @param crossSection_obj cross section object as alternative input. +#' default = NULL. #' @param name name of virtual cross section to use #' @param spatial_network_name name of spatial network to use #' @param mesh_grid_color color for the meshgrid lines @@ -862,86 +967,99 @@ insertCrossSectionSpatPlot3D <- function(gobject, #' @param show_plot show plots #' @param return_plot return ggplot object #' @param save_plot logical. directly save the plot -#' @param save_param list of saving parameters from [GiottoVisuals::all_plots_save_function] -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters from +#' [GiottoVisuals::all_plots_save_function] +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @param ... parameters for spatGenePlot3D #' @return ggplot #' @details Description of parameters. #' @md #' @export insertCrossSectionGenePlot3D <- function( - gobject, - spat_loc_name = 'raw', - crossSection_obj=NULL, - name=NULL, - spatial_network_name = "Delaunay_network", - mesh_grid_color = "#1f77b4", - mesh_grid_width = 3, - mesh_grid_style = "dot", - sdimx = "sdimx", sdimy = "sdimy", sdimz = "sdimz", - show_other_cells = F, - axis_scale = c("cube", "real", "custom"), - custom_ratio = NULL, - show_plot = NULL, return_plot = NULL, save_plot = NULL, - save_param = list(), - default_save_name = "spatGenePlot3D_with_cross_section", - ... -){ - - # load cross section object - if (!is.null(crossSection_obj)){ - crossSection_obj = crossSection_obj - }else{ - crossSection_obj = read_crossSection(gobject,name=name,spatial_network_name = spatial_network_name) - } - - pl = spatGenePlot3D(gobject, - show_other_cells = F, - axis_scale = axis_scale, - custom_ratio = custom_ratio, - show_plot = FALSE, - return_plot = TRUE, - save_plot = FALSE, - default_save_name = default_save_name,...) - for (i in 1:dim(crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X)[2]){ - - pl = pl %>% plotly::add_trace(x = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X[,i], - y = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Y[,i], - z = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Z[,i], - mode = 'lines+markers',type = 'scatter3d',color = mesh_grid_color, - marker = list(color=alpha(mesh_grid_color,0)), - line = list(color = mesh_grid_color, width = mesh_grid_width,dash = mesh_grid_style)) - } - - current_ratio = plotly_axis_scale_3D(gobject@spatial_locs[[spat_loc_name]], - sdimx = sdimx, - sdimy = sdimy, - sdimz = sdimz, - mode = axis_scale,custom_ratio = custom_ratio) - - new_ratio = adapt_aspect_ratio(current_ratio,gobject@spatial_locs[[spat_loc_name]], - sdimx = sdimx, - sdimy = sdimy, - sdimz = sdimz, - mesh_obj = crossSection_obj$mesh_obj) - - pl = pl %>% plotly::layout(showlegend = FALSE, - scene = list( - aspectmode='manual', - aspectratio = list(x=new_ratio[[1]], - y=new_ratio[[2]], - z=new_ratio[[3]]))) - - cowplot = pl - - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = cowplot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + gobject, + spat_loc_name = "raw", + crossSection_obj = NULL, + name = NULL, + spatial_network_name = "Delaunay_network", + mesh_grid_color = "#1f77b4", + mesh_grid_width = 3, + mesh_grid_style = "dot", + sdimx = "sdimx", sdimy = "sdimy", sdimz = "sdimz", + show_other_cells = FALSE, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + show_plot = NULL, return_plot = NULL, save_plot = NULL, + save_param = list(), + default_save_name = "spatGenePlot3D_with_cross_section", + ...) { + # load cross section object + if (!is.null(crossSection_obj)) { + crossSection_obj <- crossSection_obj + } else { + crossSection_obj <- read_crossSection( + gobject, name = name, spatial_network_name = spatial_network_name) + } + + pl <- spatGenePlot3D(gobject, + show_other_cells = FALSE, + axis_scale = axis_scale, + custom_ratio = custom_ratio, + show_plot = FALSE, + return_plot = TRUE, + save_plot = FALSE, + default_save_name = default_save_name, ... + ) + for (i in seq_len(dim( + crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X)[2])) { + pl <- pl %>% plotly::add_trace( + x = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_X[, i], + y = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Y[, i], + z = crossSection_obj$mesh_obj$mesh_grid_lines$mesh_grid_lines_Z[, i], + mode = "lines+markers", type = "scatter3d", color = mesh_grid_color, + marker = list(color = alpha(mesh_grid_color, 0)), + line = list(color = mesh_grid_color, + width = mesh_grid_width, dash = mesh_grid_style) + ) + } + + current_ratio <- plotly_axis_scale_3D(gobject@spatial_locs[[spat_loc_name]], + sdimx = sdimx, + sdimy = sdimy, + sdimz = sdimz, + mode = axis_scale, custom_ratio = custom_ratio + ) + + new_ratio <- adapt_aspect_ratio( + current_ratio, gobject@spatial_locs[[spat_loc_name]], + sdimx = sdimx, + sdimy = sdimy, + sdimz = sdimz, + mesh_obj = crossSection_obj$mesh_obj + ) + + pl <- pl %>% plotly::layout( + showlegend = FALSE, + scene = list( + aspectmode = "manual", + aspectratio = list( + x = new_ratio[[1]], + y = new_ratio[[2]], + z = new_ratio[[3]] + ) + ) + ) + + cowplot <- pl + + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = cowplot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } diff --git a/R/dd.R b/R/dd.R index dbcc6ff94..6312653f2 100644 --- a/R/dd.R +++ b/R/dd.R @@ -1,5 +1,3 @@ - - # ------------------------------------------------------------------------- # # This script contains reusable dummy documentation / templates for # commonly used params. @@ -225,13 +223,3 @@ NULL #' @param default_save_name default save name for saving, don't change, change save_name in save_param #' @keywords internal NULL - - - - - - - - - - diff --git a/R/differential_expression.R b/R/differential_expression.R index 772d7bcee..c28ba0af6 100644 --- a/R/differential_expression.R +++ b/R/differential_expression.R @@ -1,17 +1,18 @@ - - #' @title findScranMarkers #' @name findScranMarkers -#' @description Identify marker genes for all or selected clusters based on scran's implementation of findMarkers. +#' @description Identify marker genes for all or selected clusters based on +#' scran's implementation of findMarkers. #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type #' @param expression_values gene expression values to use #' @param cluster_column clusters to use #' @param subset_clusters selection of clusters to compare -#' @param group_1 group 1 cluster IDs from cluster_column for pairwise comparison +#' @param group_1 group 1 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_1_name custom name for group_1 clusters -#' @param group_2 group 2 cluster IDs from cluster_column for pairwise comparison +#' @param group_2 group 2 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_2_name custom name for group_2 clusters #' @param verbose be verbose (default = FALSE) #' @param ... additional parameters for the findMarkers function in scran @@ -21,131 +22,142 @@ #' #' To perform differential expression between custom selected groups of cells #' you need to specify the cell_ID column to parameter \emph{cluster_column} -#' and provide the individual cell IDs to the parameters \emph{group_1} and \emph{group_2} +#' and provide the individual cell IDs to the parameters \emph{group_1} and +#' \emph{group_2} #' -#' By default group names will be created by pasting the different id names within each selected group. +#' By default group names will be created by pasting the different id names +#' within each selected group. #' When you have many different ids in a single group -#' it is recommend to provide names for both groups to \emph{group_1_name} and \emph{group_2_name} +#' it is recommend to provide names for both groups to \emph{group_1_name} and +#' \emph{group_2_name} #' #' @export findScranMarkers <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - group_1 = NULL, - group_1_name = NULL, - group_2 = NULL, - group_2_name = NULL, - verbose = TRUE, - ...) { - - - # verify if optional package is installed - package_check(pkg_name = "scran", repository = "Bioc") - - - # print message with information # - if(isTRUE(verbose)) { - wrap_msg("Using 'Scran' to detect marker genes. If used in published research, please cite: - Lun ATL, McCarthy DJ, Marioni JC (2016). - 'A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.' - F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2.") - } - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression data - values = match.arg(expression_values, choices = unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - # cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - if(!cluster_column %in% colnames(cell_metadata)) { - stop('\n cluster column not found \n') - } - - # subset expression data - if(!is.null(subset_clusters)) { - - cell_metadata = cell_metadata[get(cluster_column) %in% subset_clusters] - subset_cell_IDs = cell_metadata[['cell_ID']] - expr_data = expr_data[, colnames(expr_data) %in% subset_cell_IDs] - - - } else if(!is.null(group_1) & !is.null(group_2)) { - - cell_metadata = cell_metadata[get(cluster_column) %in% c(group_1, group_2)] - - # create new pairwise group - if(!is.null(group_1_name)) { - if(!is.character(group_1_name)) stop('group_1_name needs to be a character') - group_1_name = group_1_name - } else { - group_1_name = paste0(group_1, collapse = '_') + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + group_1 = NULL, + group_1_name = NULL, + group_2 = NULL, + group_2_name = NULL, + verbose = TRUE, + ...) { + # verify if optional package is installed + package_check(pkg_name = "scran", repository = "Bioc") + + + # print message with information # + if (isTRUE(verbose)) { + message("Using 'Scran' to detect marker genes. If used in published + research, please cite: + Lun ATL, McCarthy DJ, Marioni JC (2016). + 'A step-by-step workflow for low-level analysis of single-cell RNA-seq + data with Bioconductor.' + F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2.") } - if(!is.null(group_2_name)) { - if(!is.character(group_2_name)) stop('group_2_name needs to be a character') - group_2_name = group_2_name - } else { - group_2_name = paste0(group_2, collapse = '_') - } + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # data.table variables - pairwise_select_comp = NULL + # expression data + values <- match.arg( + expression_values, + choices = unique(c("normalized", "scaled", "custom", + expression_values))) + expr_data <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) - cell_metadata[, pairwise_select_comp := ifelse(get(cluster_column) %in% group_1, group_1_name, group_2_name)] + # cluster column + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + if (!cluster_column %in% colnames(cell_metadata)) { + stop("cluster column not found") + } - cluster_column = 'pairwise_select_comp' + # subset expression data + if (!is.null(subset_clusters)) { + cell_metadata <- cell_metadata[get(cluster_column) %in% subset_clusters] + subset_cell_IDs <- cell_metadata[["cell_ID"]] + expr_data <- expr_data[, colnames(expr_data) %in% subset_cell_IDs] + } else if (!is.null(group_1) & !is.null(group_2)) { + cell_metadata <- cell_metadata[ + get(cluster_column) %in% c(group_1, group_2)] + + # create new pairwise group + if (!is.null(group_1_name)) { + if (!is.character(group_1_name)) + stop("group_1_name needs to be a character") + group_1_name <- group_1_name + } else { + group_1_name <- paste0(group_1, collapse = "_") + } - # expression data - subset_cell_IDs = cell_metadata[['cell_ID']] - expr_data = expr_data[, colnames(expr_data) %in% subset_cell_IDs] + if (!is.null(group_2_name)) { + if (!is.character(group_2_name)) + stop("group_2_name needs to be a character") + group_2_name <- group_2_name + } else { + group_2_name <- paste0(group_2, collapse = "_") + } - } + # data.table variables + pairwise_select_comp <- NULL + cell_metadata[, pairwise_select_comp := ifelse( + get(cluster_column) %in% group_1, group_1_name, group_2_name)] - ## SCRAN ## - marker_results = scran::findMarkers(x = expr_data, groups = cell_metadata[[cluster_column]], ...) + cluster_column <- "pairwise_select_comp" - # data.table variables - genes = cluster = feats = NULL + # expression data + subset_cell_IDs <- cell_metadata[["cell_ID"]] + expr_data <- expr_data[, colnames(expr_data) %in% subset_cell_IDs] + } - savelist = lapply(names(marker_results), FUN = function(x) { - dfr = marker_results[[x]] - DT = data.table::as.data.table(dfr) - DT[, feats := rownames(dfr)] - DT[, cluster := x] - }) + ## SCRAN ## + marker_results <- scran::findMarkers( + x = expr_data, groups = cell_metadata[[cluster_column]], ...) + + # data.table variables + genes <- cluster <- feats <- NULL - return(savelist) + savelist <- lapply(names(marker_results), FUN = function(x) { + dfr <- marker_results[[x]] + DT <- data.table::as.data.table(dfr) + DT[, feats := rownames(dfr)] + DT[, cluster := x] + }) + return(savelist) } #' @title findScranMarkers_one_vs_all #' @name findScranMarkers_one_vs_all -#' @description Identify marker feats for all clusters in a one vs all manner based on scran's implementation of findMarkers. +#' @description Identify marker feats for all clusters in a one vs all manner +#' based on scran's implementation of findMarkers. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -162,148 +174,167 @@ findScranMarkers <- function(gobject, #' @seealso \code{\link{findScranMarkers}} #' @export findScranMarkers_one_vs_all <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - pval = 0.01, - logFC = 0.5, - min_feats = 10, - min_genes = NULL, - verbose = TRUE, - ...) { - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # verify if optional package is installed - package_check(pkg_name = "scran", repository = "Bioc") - - # print message with information # - if(verbose) message("using 'Scran' to detect marker feats. If used in published research, please cite: - Lun ATL, McCarthy DJ, Marioni JC (2016). - 'A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.' - F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2. ") - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression data - values = match.arg(expression_values, choices = unique(c('normalized', 'scaled', 'custom', expression_values))) - - # cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - if(!cluster_column %in% colnames(cell_metadata)) { - stop('\n cluster column not found \n') - } - - # restrict to a subset of clusters - if(!is.null(subset_clusters)) { - - cell_metadata = cell_metadata[get(cluster_column) %in% subset_clusters] - subset_cell_IDs = cell_metadata[['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - cell_ids = subset_cell_IDs, - verbose = FALSE) + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + pval = 0.01, + logFC = 0.5, + min_feats = 10, + min_genes = NULL, + verbose = TRUE, + ...) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") + } + + # verify if optional package is installed + package_check(pkg_name = "scran", repository = "Bioc") + + # print message with information # + if (verbose) + message("using 'Scran' to detect marker feats. If used in published + research, please cite: Lun ATL, McCarthy DJ, Marioni JC (2016). + 'A step-by-step workflow for low-level analysis of single-cell RNA-seq + data with Bioconductor.' + F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2. ") + + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # expression data + values <- match.arg( + expression_values, + choices = unique(c("normalized", "scaled", "custom", + expression_values))) + + # cluster column cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - } + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + if (!cluster_column %in% colnames(cell_metadata)) { + stop("cluster column not found") + } + # restrict to a subset of clusters + if (!is.null(subset_clusters)) { + cell_metadata <- cell_metadata[get(cluster_column) %in% subset_clusters] + subset_cell_IDs <- cell_metadata[["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + cell_ids = subset_cell_IDs, + verbose = FALSE + ) + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + } - # sort uniq clusters - uniq_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) + # sort uniq clusters + uniq_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) - # save list - progressr::with_progress({ - pb = progressr::progressor(along = uniq_clusters) - result_list = lapply( - seq_along(uniq_clusters), - function(clus_i) { - selected_clus = uniq_clusters[clus_i] - other_clus = uniq_clusters[uniq_clusters != selected_clus] - if(verbose == TRUE) { - cat('\n start with cluster ', selected_clus, '\n') - } + # save list + progressr::with_progress({ + pb <- progressr::progressor(along = uniq_clusters) + result_list <- lapply( + seq_along(uniq_clusters), + function(clus_i) { + selected_clus <- uniq_clusters[clus_i] + other_clus <- uniq_clusters[uniq_clusters != selected_clus] - # one vs all markers - markers = findScranMarkers(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expression_values = values, - cluster_column = cluster_column, - group_1 = selected_clus, - group_2 = other_clus, - verbose = FALSE) - - # identify list to continue with - select_bool = unlist(lapply(markers, FUN = function(x) { - unique(x$cluster) == selected_clus - })) - selected_table = data.table::as.data.table(markers[select_bool]) - - # remove summary column from scran output if present - col_ind_keep = !grepl('summary', colnames(selected_table)) - selected_table = selected_table[, col_ind_keep, with = F] - - # change logFC.xxx name to logFC - data.table::setnames(selected_table, colnames(selected_table)[4], 'logFC') - data.table::setnames(selected_table, colnames(selected_table)[5], 'feats') - - # filter selected table - filtered_table = selected_table[logFC > 0] - filtered_table[, 'ranking' := rank(-logFC)] + if (verbose == TRUE) { + cat("start with cluster ", selected_clus) + } - # data.table variables - p.value = ranking = NULL + # one vs all markers + markers <- findScranMarkers( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + cluster_column = cluster_column, + group_1 = selected_clus, + group_2 = other_clus, + verbose = FALSE + ) - filtered_table = filtered_table[(p.value <= pval & logFC >= logFC) | (ranking <= min_feats)] + # identify list to continue with + select_bool <- unlist(lapply(markers, FUN = function(x) { + unique(x$cluster) == selected_clus + })) + selected_table <- data.table::as.data.table( + markers[select_bool]) - pb(message = c('cluster ', clus_i, '/', length(uniq_clusters))) - return(filtered_table) - } - ) - }) + # remove summary column from scran output if present + col_ind_keep <- !grepl("summary", colnames(selected_table)) + selected_table <- selected_table[, col_ind_keep, with = FALSE] + + # change logFC.xxx name to logFC + data.table::setnames( + selected_table, colnames(selected_table)[4], "logFC") + data.table::setnames( + selected_table, colnames(selected_table)[5], "feats") + + # filter selected table + filtered_table <- selected_table[logFC > 0] + filtered_table[, "ranking" := rank(-logFC)] + + # data.table variables + p.value <- ranking <- NULL + filtered_table <- filtered_table[ + (p.value <= pval & logFC >= logFC) | (ranking <= min_feats)] - return(do.call('rbind', result_list)) + pb(message = c("cluster ", clus_i, "/", length(uniq_clusters))) + return(filtered_table) + } + ) + }) + + return(do.call("rbind", result_list)) } #' @title findGiniMarkers #' @name findGiniMarkers -#' @description Identify marker feats for selected clusters based on gini detection and expression scores. +#' @description Identify marker feats for selected clusters based on gini +#' detection and expression scores. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit #' @param expression_values feat expression values to use #' @param cluster_column clusters to use #' @param subset_clusters selection of clusters to compare -#' @param group_1 group 1 cluster IDs from cluster_column for pairwise comparison +#' @param group_1 group 1 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_1_name custom name for group_1 clusters -#' @param group_2 group 2 cluster IDs from cluster_column for pairwise comparison +#' @param group_2 group 2 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_2_name custom name for group_2 clusters #' @param min_expr_gini_score filter on minimum gini coefficient for expression #' @param min_det_gini_score filter on minimum gini coefficient for detection @@ -313,7 +344,8 @@ findScranMarkers_one_vs_all <- function(gobject, #' @param min_genes deprecated, use min_feats #' @return data.table with marker feats #' @details -#' Detection of marker feats using the \url{https://en.wikipedia.org/wiki/Gini_coefficient}{gini} +#' Detection of marker feats using the +#' \url{https://en.wikipedia.org/wiki/Gini_coefficient}{gini} #' coefficient is based on the following steps/principles per feat: #' \itemize{ #' \item{1. calculate average expression per cluster} @@ -325,198 +357,233 @@ findScranMarkers_one_vs_all <- function(gobject, #' \item{7. for each feat sort on expression and detection rank and combined score} #' } #' -#' As a results "top gini" feats are feats that are very selectivily expressed in a specific cluster, -#' however not always expressed in all cells of that cluster. In other words highly specific, but +#' As a results "top gini" feats are feats that are very selectivily expressed +#' in a specific cluster, +#' however not always expressed in all cells of that cluster. In other words +#' highly specific, but #' not necessarily sensitive at the single-cell level. #' #' To perform differential expression between custom selected groups of cells #' you need to specify the cell_ID column to parameter \emph{cluster_column} -#' and provide the individual cell IDs to the parameters \emph{group_1} and \emph{group_2} +#' and provide the individual cell IDs to the parameters \emph{group_1} and +#' \emph{group_2} #' -#' By default group names will be created by pasting the different id names within each selected group. +#' By default group names will be created by pasting the different id names +#' within each selected group. #' When you have many different ids in a single group -#' it is recommend to provide names for both groups to \emph{group_1_name} and \emph{group_2_name} +#' it is recommend to provide names for both groups to \emph{group_1_name} and +#' \emph{group_2_name} #' #' @export findGiniMarkers <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - group_1 = NULL, - group_1_name = NULL, - group_2 = NULL, - group_2_name = NULL, - min_expr_gini_score = 0.2, - min_det_gini_score = 0.2, - detection_threshold = 0, - rank_score = 1, - min_feats = 5, - min_genes = NULL) { - - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## select expression values - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - - # cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - if(!cluster_column %in% colnames(cell_metadata[])) { - stop('\n cluster column not found \n') - } - - - # subset clusters - if(!is.null(subset_clusters)) { - - cell_metadata[] = cell_metadata[][get(cluster_column) %in% subset_clusters] - subset_cell_IDs = cell_metadata[][['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = subset_cell_IDs) - - } else if(!is.null(group_1) & !is.null(group_2)) { - - cell_metadata[] = cell_metadata[][get(cluster_column) %in% c(group_1, group_2)] - - # create new pairwise group - if(!is.null(group_1_name)) { - if(!is.character(group_1_name)) stop('group_1_name needs to be a character') - group_1_name = group_1_name - } else { - group_1_name = paste0(group_1, collapse = '_') + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + group_1 = NULL, + group_1_name = NULL, + group_2 = NULL, + group_2_name = NULL, + min_expr_gini_score = 0.2, + min_det_gini_score = 0.2, + detection_threshold = 0, + rank_score = 1, + min_feats = 5, + min_genes = NULL) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") } - if(!is.null(group_2_name)) { - if(!is.character(group_2_name)) stop('group_2_name needs to be a character') - group_2_name = group_2_name - } else { - group_2_name = paste0(group_2, collapse = '_') - } - # data.table variables - pairwise_select_comp = NULL + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - cell_metadata[][, pairwise_select_comp := ifelse(get(cluster_column) %in% group_1, group_1_name, group_2_name)] + ## select expression values + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) - cluster_column = 'pairwise_select_comp' - # expression data - subset_cell_IDs = cell_metadata[][['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = subset_cell_IDs) + # cluster column + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_cell_metadata(gobject, - metadata = cell_metadata, - verbose = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } + if (!cluster_column %in% colnames(cell_metadata[])) { + stop("cluster column not found") + } - # average expression per cluster - aggr_sc_clusters = create_average_DT(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - meta_data_name = cluster_column, - expression_values = values) - aggr_sc_clusters_DT = data.table::as.data.table(aggr_sc_clusters) + # subset clusters + if (!is.null(subset_clusters)) { + cell_metadata[] <- cell_metadata[][ + get(cluster_column) %in% subset_clusters] + subset_cell_IDs <- cell_metadata[][["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = subset_cell_IDs + ) + } else if (!is.null(group_1) & !is.null(group_2)) { + cell_metadata[] <- cell_metadata[][ + get(cluster_column) %in% c(group_1, group_2)] + + # create new pairwise group + if (!is.null(group_1_name)) { + if (!is.character(group_1_name)) + stop("group_1_name needs to be a character") + group_1_name <- group_1_name + } else { + group_1_name <- paste0(group_1, collapse = "_") + } - # data.table variables - feats = NULL + if (!is.null(group_2_name)) { + if (!is.character(group_2_name)) + stop("group_2_name needs to be a character") + group_2_name <- group_2_name + } else { + group_2_name <- paste0(group_2, collapse = "_") + } + # data.table variables + pairwise_select_comp <- NULL + + cell_metadata[][, pairwise_select_comp := ifelse( + get(cluster_column) %in% group_1, group_1_name, group_2_name)] + + cluster_column <- "pairwise_select_comp" + + # expression data + subset_cell_IDs <- cell_metadata[][["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = subset_cell_IDs + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_cell_metadata(gobject, + metadata = cell_metadata, + verbose = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } - aggr_sc_clusters_DT[, feats := rownames(aggr_sc_clusters)] - aggr_sc_clusters_DT_melt = data.table::melt.data.table(aggr_sc_clusters_DT, - variable.name = 'cluster', - id.vars = 'feats', - value.name = 'expression') + # average expression per cluster + aggr_sc_clusters <- create_average_DT( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + meta_data_name = cluster_column, + expression_values = values + ) + aggr_sc_clusters_DT <- data.table::as.data.table(aggr_sc_clusters) - ## detection per cluster - aggr_detection_sc_clusters = create_average_detection_DT(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - meta_data_name = cluster_column, - expression_values = values, - detection_threshold = detection_threshold) - aggr_detection_sc_clusters_DT = data.table::as.data.table(aggr_detection_sc_clusters) - aggr_detection_sc_clusters_DT[, feats := rownames(aggr_detection_sc_clusters)] - aggr_detection_sc_clusters_DT_melt = data.table::melt.data.table(aggr_detection_sc_clusters_DT, - variable.name = 'cluster', - id.vars = 'feats', - value.name = 'detection') + # data.table variables + feats <- NULL - ## gini - # data.table variables - expression_gini = detection_gini = detection = NULL + aggr_sc_clusters_DT[, feats := rownames(aggr_sc_clusters)] + aggr_sc_clusters_DT_melt <- data.table::melt.data.table(aggr_sc_clusters_DT, + variable.name = "cluster", + id.vars = "feats", + value.name = "expression" + ) - aggr_sc_clusters_DT_melt[, expression_gini := mygini_fun(expression), by = feats] - aggr_detection_sc_clusters_DT_melt[, detection_gini := mygini_fun(detection), by = feats] + ## detection per cluster + aggr_detection_sc_clusters <- create_average_detection_DT( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + meta_data_name = cluster_column, + expression_values = values, + detection_threshold = detection_threshold + ) + aggr_detection_sc_clusters_DT <- data.table::as.data.table( + aggr_detection_sc_clusters) + aggr_detection_sc_clusters_DT[, feats := rownames( + aggr_detection_sc_clusters)] + aggr_detection_sc_clusters_DT_melt <- data.table::melt.data.table( + aggr_detection_sc_clusters_DT, + variable.name = "cluster", + id.vars = "feats", + value.name = "detection" + ) + + ## gini + # data.table variables + expression_gini <- detection_gini <- detection <- NULL + + aggr_sc_clusters_DT_melt[, expression_gini := mygini_fun( + expression), by = feats] + aggr_detection_sc_clusters_DT_melt[, detection_gini := mygini_fun( + detection), by = feats] - ## combine - aggr_sc = cbind(aggr_sc_clusters_DT_melt, aggr_detection_sc_clusters_DT_melt[,.(detection, detection_gini)]) - ## create combined rank + ## combine + aggr_sc <- cbind(aggr_sc_clusters_DT_melt, + aggr_detection_sc_clusters_DT_melt[ + , .(detection, detection_gini)]) - # expression rank for each feat in all samples - # rescale expression rank range between 1 and 0.1 + ## create combined rank - # data.table variables - expression_rank = cluster = detection_rank = NULL + # expression rank for each feat in all samples + # rescale expression rank range between 1 and 0.1 - aggr_sc[, expression_rank := rank(-expression), by = feats] - aggr_sc[, expression_rank := scales::rescale(expression_rank, to = c(1, 0.1)), by = cluster] + # data.table variables + expression_rank <- cluster <- detection_rank <- NULL - # detection rank for each feat in all samples - # rescale detection rank range between 1 and 0.1 - aggr_sc[, detection_rank := rank(-detection), by = feats] - aggr_sc[, detection_rank := scales::rescale(detection_rank, to = c(1, 0.1)), by = cluster] + aggr_sc[, expression_rank := rank(-expression), by = feats] + aggr_sc[, expression_rank := scales::rescale( + expression_rank, to = c(1, 0.1)), by = cluster] - # create combine score based on rescaled ranks and gini scores + # detection rank for each feat in all samples + # rescale detection rank range between 1 and 0.1 + aggr_sc[, detection_rank := rank(-detection), by = feats] + aggr_sc[, detection_rank := scales::rescale( + detection_rank, to = c(1, 0.1)), by = cluster] - # data.table variables - comb_score = comb_rank = NULL + # create combine score based on rescaled ranks and gini scores - aggr_sc[, comb_score := (expression_gini*expression_rank)*(detection_gini*detection_rank)] - setorder(aggr_sc, cluster, -comb_score) - aggr_sc[, comb_rank := 1:.N, by = cluster] + # data.table variables + comb_score <- comb_rank <- NULL - top_feats_scores = aggr_sc[comb_rank <= min_feats | (expression_rank <= rank_score & detection_rank <= rank_score)] - top_feats_scores_filtered = top_feats_scores[comb_rank <= min_feats | (expression > min_expr_gini_score & detection > min_det_gini_score)] - setorder(top_feats_scores_filtered, cluster, comb_rank) + aggr_sc[, comb_score := (expression_gini * expression_rank) * ( + detection_gini * detection_rank)] + setorder(aggr_sc, cluster, -comb_score) + aggr_sc[, comb_rank := 1:.N, by = cluster] + top_feats_scores <- aggr_sc[comb_rank <= min_feats | ( + expression_rank <= rank_score & detection_rank <= rank_score)] + top_feats_scores_filtered <- top_feats_scores[comb_rank <= min_feats | ( + expression > min_expr_gini_score & detection > min_det_gini_score)] + setorder(top_feats_scores_filtered, cluster, comb_rank) - # remove 'cluster_' part if this is not part of the original cluster names - original_uniq_cluster_names = unique(cell_metadata[][[cluster_column]]) - if(sum(grepl('cluster_', original_uniq_cluster_names)) == 0) { - top_feats_scores_filtered[, cluster := gsub(x = cluster, 'cluster_', '')] - } - return(top_feats_scores_filtered) + # remove 'cluster_' part if this is not part of the original cluster names + original_uniq_cluster_names <- unique(cell_metadata[][[cluster_column]]) + if (sum(grepl("cluster_", original_uniq_cluster_names)) == 0) { + top_feats_scores_filtered[, cluster := gsub( + x = cluster, "cluster_", "")] + } + return(top_feats_scores_filtered) } @@ -524,7 +591,8 @@ findGiniMarkers <- function(gobject, #' @title findGiniMarkers_one_vs_all #' @name findGiniMarkers_one_vs_all -#' @description Identify marker feats for all clusters in a one vs all manner based on gini detection and expression scores. +#' @description Identify marker feats for all clusters in a one vs all manner +#' based on gini detection and expression scores. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -542,283 +610,323 @@ findGiniMarkers <- function(gobject, #' @seealso \code{\link{findGiniMarkers}} #' @export findGiniMarkers_one_vs_all <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - min_expr_gini_score = 0.5, - min_det_gini_score = 0.5, - detection_threshold = 0, - rank_score = 1, - min_feats = 4, - min_genes = NULL, - verbose = TRUE) { - - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## select expression values - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - - # cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - - if(!cluster_column %in% colnames(cell_metadata)) { - stop('\n cluster column not found \n') - } - - if(!is.null(subset_clusters)) { - - cell_metadata = cell_metadata[get(cluster_column) %in% subset_clusters] - subset_cell_IDs = cell_metadata[['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = subset_cell_IDs) - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - } - - - # sort uniq clusters - uniq_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + min_expr_gini_score = 0.5, + min_det_gini_score = 0.5, + detection_threshold = 0, + rank_score = 1, + min_feats = 4, + min_genes = NULL, + verbose = TRUE) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") + } + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # GINI - progressr::with_progress({ - pb = progressr::progressor(along = uniq_clusters) - result_list = lapply( - seq_along(uniq_clusters), - function(clus_i) { - selected_clus = uniq_clusters[clus_i] - other_clus = uniq_clusters[uniq_clusters != selected_clus] + ## select expression values + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) - if(verbose == TRUE) { - cat('\n start with cluster ', selected_clus, '\n') - } - markers = findGiniMarkers(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = values, - cluster_column = cluster_column, - group_1 = selected_clus, - group_2 = other_clus, - min_expr_gini_score = min_expr_gini_score, - min_det_gini_score = min_det_gini_score, - detection_threshold = detection_threshold, - rank_score = rank_score, - min_feats = min_feats) - - # filter steps - #clus_name = paste0('cluster_', selected_clus) - - # data.table variables - cluster = NULL + # cluster column + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) - filtered_table = markers[cluster == selected_clus] + if (!cluster_column %in% colnames(cell_metadata)) { + stop("\n cluster column not found \n") + } - pb(message = c('cluster ', clus_i, '/', length(uniq_clusters))) - return(filtered_table) - } - ) - }) + if (!is.null(subset_clusters)) { + cell_metadata <- cell_metadata[get(cluster_column) %in% subset_clusters] + subset_cell_IDs <- cell_metadata[["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = subset_cell_IDs + ) + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + } - return(do.call('rbind', result_list)) + # sort uniq clusters + uniq_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) + + + # GINI + progressr::with_progress({ + pb <- progressr::progressor(along = uniq_clusters) + result_list <- lapply( + seq_along(uniq_clusters), + function(clus_i) { + selected_clus <- uniq_clusters[clus_i] + other_clus <- uniq_clusters[uniq_clusters != selected_clus] + + if (verbose == TRUE) { + cat("start with cluster ", selected_clus) + } + + markers <- findGiniMarkers( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = values, + cluster_column = cluster_column, + group_1 = selected_clus, + group_2 = other_clus, + min_expr_gini_score = min_expr_gini_score, + min_det_gini_score = min_det_gini_score, + detection_threshold = detection_threshold, + rank_score = rank_score, + min_feats = min_feats + ) + + # filter steps + + # data.table variables + cluster <- NULL + + filtered_table <- markers[cluster == selected_clus] + + pb(message = c("cluster ", clus_i, "/", length(uniq_clusters))) + return(filtered_table) + } + ) + }) + + return(do.call("rbind", result_list)) } #' @title findMastMarkers #' @name findMastMarkers -#' @description Identify marker feats for selected clusters based on the MAST package. +#' @description Identify marker feats for selected clusters based on the +#' MAST package. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit #' @param expression_values feat expression values to use #' @param cluster_column clusters to use -#' @param group_1 group 1 cluster IDs from cluster_column for pairwise comparison +#' @param group_1 group 1 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_1_name custom name for group_1 clusters -#' @param group_2 group 2 cluster IDs from cluster_column for pairwise comparison +#' @param group_2 group 2 cluster IDs from cluster_column for pairwise +#' comparison #' @param group_2_name custom name for group_2 clusters #' @param adjust_columns column in pDataDT to adjust for (e.g. detection rate) #' @param verbose be verbose #' @param ... additional parameters for the zlm function in MAST #' @return data.table with marker feats -#' @details This is a minimal convenience wrapper around the \code{\link[MAST]{zlm}} -#' from the MAST package to detect differentially expressed feats. Caution: with large datasets +#' @details This is a minimal convenience wrapper around the +#' \code{\link[MAST]{zlm}} +#' from the MAST package to detect differentially expressed feats. Caution: +#' with large datasets #' MAST might take a long time to run and finish #' @export findMastMarkers <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - group_1 = NULL, - group_1_name = NULL, - group_2 = NULL, - group_2_name = NULL, - adjust_columns = NULL, - verbose = FALSE, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # verify if optional package is installed - package_check(pkg_name = "MAST", repository = "Bioc") - - # print message with information # - if(verbose) wrap_msg("using 'MAST' to detect marker feats. If used in published research, please cite: - McDavid A, Finak G, Yajima M (2020). - MAST: Model-based Analysis of Single Cell Transcriptomics. R package version 1.14.0, - https://github.com/RGLab/MAST/.") - - ## select expression values to use - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - - ## cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - if(!cluster_column %in% colnames(cell_metadata[])) { - stop('\n cluster column not found \n') - } - - ## select group ids - if(is.null(group_1) | is.null(group_2)) { - stop('\n specificy group ids for both group_1 and group_2 \n') - } - - ## subset data based on group_1 and group_2 - cell_metadata[] = cell_metadata[][get(cluster_column) %in% c(group_1, group_2)] - if(nrow(cell_metadata[]) == 0) { - stop('\n there are no cells for group_1 or group_2, check cluster column \n') - } - - ## create new pairwise group - if(is.null(group_1_name)) group_1_name = paste0(group_1, collapse = '_') - if(is.null(group_2_name)) group_2_name = paste0(group_2, collapse = '_') - - # data.table variables - pairwise_select_comp = NULL - - cell_metadata[][, pairwise_select_comp := ifelse(get(cluster_column) %in% group_1, group_1_name, group_2_name)] - - if(nrow(cell_metadata[][pairwise_select_comp == group_1_name]) == 0) { - stop('\n there are no cells for group_1, check cluster column \n') - } - - if(nrow(cell_metadata[][pairwise_select_comp == group_2_name]) == 0) { - stop('\n there are no cells for group_2, check cluster column \n') - } - - cluster_column = 'pairwise_select_comp' - - # expression data - subset_cell_IDs = cell_metadata[][['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = subset_cell_IDs) - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_cell_metadata(gobject, - metadata = cell_metadata, - verbose = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - - ## START MAST ## - ## create mast object ## - # expression data - values = match.arg(expression_values, choices = unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_data = get_expression_values(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - values = values, - output = 'matrix') - # column & row data - column_data <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - setnames(column_data, 'cell_ID', 'wellKey') - row_data <- getFeatureMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - setnames(row_data, 'feat_ID', 'primerid') - # mast object - mast_data = MAST::FromMatrix(exprsArray = expr_data, - cData = column_data, - fData = row_data) - - ## set conditions and relevel - cond <- factor(SingleCellExperiment::colData(mast_data)[[cluster_column]]) - cond <- stats::relevel(cond, group_2_name) - mast_data@colData[[cluster_column]] <- cond - - ## create formula and run MAST feat regressions - if(!is.null(adjust_columns)) { - myformula = stats::as.formula(paste0("~ 1 + ",cluster_column, " + ", paste(adjust_columns, collapse = " + "))) - } else { - myformula = stats::as.formula(paste0("~ 1 + ",cluster_column)) - } - zlmCond <- MAST::zlm(formula = myformula, sca = mast_data, ...) - - ## run LRT and return data.table with results - - # data.table variables - contrast = component = primerid = `Pr(>Chisq)` = coef = ci.hi = ci.lo = fdr = NULL - - sample = paste0(cluster_column, group_1_name) - summaryCond <- MAST::summary(zlmCond, doLRT=sample) - summaryDt <- summaryCond$datatable - fcHurdle <- merge(summaryDt[contrast==sample & component=='H',.(primerid, `Pr(>Chisq)`)], #hurdle P values - summaryDt[contrast==sample & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid') #logFC coefficients - fcHurdle[,fdr := stats::p.adjust(`Pr(>Chisq)`, 'fdr')] - data.table::setorder(fcHurdle, fdr) - - # data.table variables - cluster = NULL - - fcHurdle[, cluster := paste0(group_1_name,'_vs_', group_2_name)] - data.table::setnames(fcHurdle, old = 'primerid', new = 'feats') - - return(fcHurdle) + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + group_1 = NULL, + group_1_name = NULL, + group_2 = NULL, + group_2_name = NULL, + adjust_columns = NULL, + verbose = FALSE, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # verify if optional package is installed + package_check(pkg_name = "MAST", repository = "Bioc") + + # print message with information # + if (verbose) + message("using 'MAST' to detect marker feats. If used in published + research, please cite: McDavid A, Finak G, Yajima M (2020). + MAST: Model-based Analysis of Single Cell Transcriptomics. + R package version 1.14.0, https://github.com/RGLab/MAST/.") + + ## select expression values to use + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + + ## cluster column + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + if (!cluster_column %in% colnames(cell_metadata[])) { + stop("cluster column not found") + } + + ## select group ids + if (is.null(group_1) | is.null(group_2)) { + stop("specificy group ids for both group_1 and group_2") + } + + ## subset data based on group_1 and group_2 + cell_metadata[] <- cell_metadata[][ + get(cluster_column) %in% c(group_1, group_2)] + if (nrow(cell_metadata[]) == 0) { + stop("there are no cells for group_1 or group_2, check cluster column") + } + + ## create new pairwise group + if (is.null(group_1_name)) group_1_name <- paste0(group_1, collapse = "_") + if (is.null(group_2_name)) group_2_name <- paste0(group_2, collapse = "_") + + # data.table variables + pairwise_select_comp <- NULL + + cell_metadata[][, pairwise_select_comp := ifelse( + get(cluster_column) %in% group_1, group_1_name, group_2_name)] + + if (nrow(cell_metadata[][pairwise_select_comp == group_1_name]) == 0) { + stop("there are no cells for group_1, check cluster column") + } + + if (nrow(cell_metadata[][pairwise_select_comp == group_2_name]) == 0) { + stop("there are no cells for group_2, check cluster column") + } + + cluster_column <- "pairwise_select_comp" + + # expression data + subset_cell_IDs <- cell_metadata[][["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = subset_cell_IDs + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_cell_metadata(gobject, + metadata = cell_metadata, + verbose = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + ## START MAST ## + ## create mast object ## + # expression data + values <- match.arg( + expression_values, + choices = unique(c("normalized", "scaled", "custom", + expression_values))) + expr_data <- getExpression( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + values = values, + output = "matrix" + ) + # column & row data + column_data <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + setnames(column_data, "cell_ID", "wellKey") + row_data <- getFeatureMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + setnames(row_data, "feat_ID", "primerid") + # mast object + mast_data <- MAST::FromMatrix( + exprsArray = expr_data, + cData = column_data, + fData = row_data + ) + ## set conditions and relevel + cond <- factor(SingleCellExperiment::colData(mast_data)[[cluster_column]]) + cond <- stats::relevel(cond, group_2_name) + mast_data@colData[[cluster_column]] <- cond + + ## create formula and run MAST feat regressions + if (!is.null(adjust_columns)) { + myformula <- stats::as.formula(paste0( + "~ 1 + ", cluster_column, " + ", + paste(adjust_columns, collapse = " + "))) + } else { + myformula <- stats::as.formula(paste0("~ 1 + ", cluster_column)) + } + zlmCond <- MAST::zlm(formula = myformula, sca = mast_data, ...) + + ## run LRT and return data.table with results + + # data.table variables + contrast <- component <- primerid <- `Pr(>Chisq)` <- coef <- + ci.hi <- ci.lo <- fdr <- NULL + + sample <- paste0(cluster_column, group_1_name) + summaryCond <- MAST::summary(zlmCond, doLRT = sample) + summaryDt <- summaryCond$datatable + fcHurdle <- merge(summaryDt[ + contrast == sample & component == "H", + .(primerid, `Pr(>Chisq)`)], # hurdle P values + summaryDt[ + contrast == sample & component == "logFC", + .(primerid, coef, ci.hi, ci.lo)], + by = "primerid" + ) # logFC coefficients + fcHurdle[, fdr := stats::p.adjust(`Pr(>Chisq)`, "fdr")] + data.table::setorder(fcHurdle, fdr) + + # data.table variables + cluster <- NULL + + fcHurdle[, cluster := paste0(group_1_name, "_vs_", group_2_name)] + data.table::setnames(fcHurdle, old = "primerid", new = "feats") + + return(fcHurdle) } @@ -826,7 +934,8 @@ findMastMarkers <- function(gobject, #' @title findMastMarkers_one_vs_all #' @name findMastMarkers_one_vs_all -#' @description Identify marker feats for all clusters in a one vs all manner based on the MAST package. +#' @description Identify marker feats for all clusters in a one vs all manner +#' based on the MAST package. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -843,113 +952,120 @@ findMastMarkers <- function(gobject, #' @return data.table with marker feats #' @seealso \code{\link{findMastMarkers}} #' @export -findMastMarkers_one_vs_all = function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - adjust_columns = NULL, - pval = 0.001, - logFC = 1, - min_feats = 10, - min_genes = NULL, - verbose = TRUE, - ...) { - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # verify if optional package is installed - package_check(pkg_name = "MAST", repository = "Bioc") - - # print message with information # - if(verbose) message("using 'MAST' to detect marker feats. If used in published research, please cite: - McDavid A, Finak G, Yajima M (2020). - MAST: Model-based Analysis of Single Cell Transcriptomics. R package version 1.14.0, - https://github.com/RGLab/MAST/.") - - - ## cluster column - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - if(!cluster_column %in% colnames(cell_metadata)) { - stop('\n cluster column not found \n') - } - - # restrict to a subset of clusters - if(!is.null(subset_clusters)) { - - cell_metadata = cell_metadata[get(cluster_column) %in% subset_clusters] - subset_cell_IDs = cell_metadata[['cell_ID']] - gobject = subsetGiotto(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - cell_ids = subset_cell_IDs, - verbose = FALSE) - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - } +findMastMarkers_one_vs_all <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + adjust_columns = NULL, + pval = 0.001, + logFC = 1, + min_feats = 10, + min_genes = NULL, + verbose = TRUE, + ...) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") + } - ## sort uniq clusters - uniq_clusters = mixedsort(unique(cell_metadata[[cluster_column]])) + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # save list - result_list = list() + # verify if optional package is installed + package_check(pkg_name = "MAST", repository = "Bioc") - for(clus_i in seq_along(uniq_clusters)) { + # print message with information # + if (verbose) + message("using 'MAST' to detect marker feats. If used in published + research, please cite: McDavid A, Finak G, Yajima M (2020). + MAST: Model-based Analysis of Single Cell Transcriptomics. + R package version 1.14.0, https://github.com/RGLab/MAST/.") - selected_clus = uniq_clusters[clus_i] - other_clus = uniq_clusters[uniq_clusters != selected_clus] - if(verbose == TRUE) { - cat('\n start with cluster ', selected_clus, '\n') + ## cluster column + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + if (!cluster_column %in% colnames(cell_metadata)) { + stop("cluster column not found") + } + + # restrict to a subset of clusters + if (!is.null(subset_clusters)) { + cell_metadata <- cell_metadata[get(cluster_column) %in% subset_clusters] + subset_cell_IDs <- cell_metadata[["cell_ID"]] + gobject <- subsetGiotto( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + cell_ids = subset_cell_IDs, + verbose = FALSE + ) + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) } - temp_mast_markers = findMastMarkers(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - adjust_columns = adjust_columns, - group_1 = selected_clus, - group_1_name = selected_clus, - group_2 = other_clus, - group_2_name = 'others', - verbose = FALSE) + ## sort uniq clusters + uniq_clusters <- mixedsort(unique(cell_metadata[[cluster_column]])) + + # save list + result_list <- list() - result_list[[clus_i]] = temp_mast_markers + for (clus_i in seq_along(uniq_clusters)) { + selected_clus <- uniq_clusters[clus_i] + other_clus <- uniq_clusters[uniq_clusters != selected_clus] - } + if (verbose == TRUE) { + cat("start with cluster ", selected_clus) + } - # filter or retain only selected marker feats - result_dt = do.call('rbind', result_list) + temp_mast_markers <- findMastMarkers( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + adjust_columns = adjust_columns, + group_1 = selected_clus, + group_1_name = selected_clus, + group_2 = other_clus, + group_2_name = "others", + verbose = FALSE + ) + + result_list[[clus_i]] <- temp_mast_markers + } - # data.table variables - ranking = fdr = coef = NULL + # filter or retain only selected marker feats + result_dt <- do.call("rbind", result_list) - result_dt[, ranking := 1:.N, by = 'cluster'] - filtered_result_dt = result_dt[ranking <= min_feats | (fdr < pval & coef > logFC)] + # data.table variables + ranking <- fdr <- coef <- NULL - return(filtered_result_dt) + result_dt[, ranking := seq_len(.N), by = "cluster"] + filtered_result_dt <- result_dt[ + ranking <= min_feats | (fdr < pval & coef > logFC)] + return(filtered_result_dt) } @@ -967,9 +1083,12 @@ findMastMarkers_one_vs_all = function(gobject, #' @param cluster_column clusters to use #' @param method method to use to detect differentially expressed feats #' @param subset_clusters selection of clusters to compare -#' @param group_1 group 1 cluster IDs from cluster_column for pairwise comparison -#' @param group_2 group 2 cluster IDs from cluster_column for pairwise comparison -#' @param min_expr_gini_score gini: filter on minimum gini coefficient for expression +#' @param group_1 group 1 cluster IDs from cluster_column for pairwise +#' comparison +#' @param group_2 group 2 cluster IDs from cluster_column for pairwise +#' comparison +#' @param min_expr_gini_score gini: filter on minimum gini coefficient for +#' expression #' @param min_det_gini_score gini: filter minimum gini coefficient for detection #' @param detection_threshold gini: detection threshold for feat expression #' @param rank_score gini: rank scores to include @@ -977,96 +1096,100 @@ findMastMarkers_one_vs_all = function(gobject, #' @param min_genes deprecated, use min_feats #' @param group_1_name mast: custom name for group_1 clusters #' @param group_2_name mast: custom name for group_2 clusters -#' @param adjust_columns mast: column in pDataDT to adjust for (e.g. detection rate) -#' @param ... additional parameters for the findMarkers function in scran or zlm function in MAST +#' @param adjust_columns mast: column in pDataDT to adjust for +#' (e.g. detection rate) +#' @param ... additional parameters for the findMarkers function in scran or +#' zlm function in MAST #' @return data.table with marker feats -#' @details Wrapper for all individual functions to detect marker feats for clusters. -#' @seealso \code{\link{findScranMarkers}}, \code{\link{findGiniMarkers}} and \code{\link{findMastMarkers}} +#' @details Wrapper for all individual functions to detect marker feats for +#' clusters. +#' @seealso \code{\link{findScranMarkers}}, \code{\link{findGiniMarkers}} and +#' \code{\link{findMastMarkers}} #' @export findMarkers <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column = NULL, - method = c('scran','gini','mast'), - subset_clusters = NULL, - group_1 = NULL, - group_2 = NULL, - min_expr_gini_score = 0.5, - min_det_gini_score = 0.5, - detection_threshold = 0, - rank_score = 1, - min_feats = 4, - min_genes = NULL, - group_1_name = NULL, - group_2_name = NULL, - adjust_columns = NULL, - ...) { - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # input - if(is.null(cluster_column)) { - stop('A valid cluster column needs to be given to cluster_column, see pDataDT()') - } - - # select method - method = match.arg(method, choices = c('scran','gini','mast')) - - if(method == 'scran') { - - markers_result = findScranMarkers(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - subset_clusters = subset_clusters, - group_1 = group_1, - group_2 = group_2, - group_1_name = group_1_name, - group_2_name = group_2_name, - ...) - } else if(method == 'gini') { - - markers_result <- findGiniMarkers(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - subset_clusters = subset_clusters, - group_1 = group_1, - group_2 = group_2, - group_1_name = group_1_name, - group_2_name = group_2_name, - min_expr_gini_score = min_expr_gini_score, - min_det_gini_score = min_det_gini_score, - detection_threshold = detection_threshold, - rank_score = rank_score, - min_feats = min_feats) - - } else if(method == 'mast') { - - markers_result <- findMastMarkers(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - group_1 = group_1, - group_1_name = group_1_name, - group_2 = group_2, - group_2_name = group_2_name, - adjust_columns = adjust_columns, - ...) - - } - - return(markers_result) + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column = NULL, + method = c("scran", "gini", "mast"), + subset_clusters = NULL, + group_1 = NULL, + group_2 = NULL, + min_expr_gini_score = 0.5, + min_det_gini_score = 0.5, + detection_threshold = 0, + rank_score = 1, + min_feats = 4, + min_genes = NULL, + group_1_name = NULL, + group_2_name = NULL, + adjust_columns = NULL, + ...) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") + } + + # input + if (is.null(cluster_column)) { + stop("A valid cluster column needs to be given to cluster_column, + see pDataDT()") + } + + # select method + method <- match.arg(method, choices = c("scran", "gini", "mast")) + + if (method == "scran") { + markers_result <- findScranMarkers( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + subset_clusters = subset_clusters, + group_1 = group_1, + group_2 = group_2, + group_1_name = group_1_name, + group_2_name = group_2_name, + ... + ) + } else if (method == "gini") { + markers_result <- findGiniMarkers( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + subset_clusters = subset_clusters, + group_1 = group_1, + group_2 = group_2, + group_1_name = group_1_name, + group_2_name = group_2_name, + min_expr_gini_score = min_expr_gini_score, + min_det_gini_score = min_det_gini_score, + detection_threshold = detection_threshold, + rank_score = rank_score, + min_feats = min_feats + ) + } else if (method == "mast") { + markers_result <- findMastMarkers( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + group_1 = group_1, + group_1_name = group_1_name, + group_2 = group_2, + group_2_name = group_2_name, + adjust_columns = adjust_columns, + ... + ) + } + return(markers_result) } @@ -1084,103 +1207,98 @@ findMarkers <- function(gobject, #' @param logFC scan & mast: filter on logFC #' @param min_feats minimum feats to keep per cluster, overrides pval and logFC #' @param min_genes deprecated, use min_feats -#' @param min_expr_gini_score gini: filter on minimum gini coefficient for expression +#' @param min_expr_gini_score gini: filter on minimum gini coefficient for +#' expression #' @param min_det_gini_score gini: filter minimum gini coefficient for detection #' @param detection_threshold gini: detection threshold for feat expression #' @param rank_score gini: rank scores to include -#' @param adjust_columns mast: column in pDataDT to adjust for (e.g. detection rate) +#' @param adjust_columns mast: column in pDataDT to adjust for +#' (e.g. detection rate) #' @param verbose be verbose -#' @param ... additional parameters for the findMarkers function in scran or zlm function in MAST +#' @param ... additional parameters for the findMarkers function in scran or +#' zlm function in MAST #' @return data.table with marker feats -#' @details Wrapper for all one vs all functions to detect marker feats for clusters. -#' @seealso \code{\link{findScranMarkers_one_vs_all}}, \code{\link{findGiniMarkers_one_vs_all}} and \code{\link{findMastMarkers_one_vs_all}} +#' @details Wrapper for all one vs all functions to detect marker feats for +#' clusters. +#' @seealso \code{\link{findScranMarkers_one_vs_all}}, +#' \code{\link{findGiniMarkers_one_vs_all}} and +#' \code{\link{findMastMarkers_one_vs_all}} #' @export findMarkers_one_vs_all <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - cluster_column, - subset_clusters = NULL, - method = c('scran','gini','mast'), - # scran & mast - pval = 0.01, - logFC = 0.5, - min_feats = 10, - min_genes = NULL, - # gini - min_expr_gini_score = 0.5, - min_det_gini_score = 0.5, - detection_threshold = 0, - rank_score = 1, - # mast specific - adjust_columns = NULL, - verbose = TRUE, - ...) { - - - ## deprecated arguments - if(!is.null(min_genes)) { - min_feats = min_genes - warning('min_genes argument is deprecated, use min_feats argument in the future \n') - } - - # select method - method = match.arg(method, choices = c('scran','gini','mast')) - - if(method == 'scran') { - - markers_result = findScranMarkers_one_vs_all(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - subset_clusters = subset_clusters, - pval = pval, - logFC = logFC, - min_feats = min_feats, - verbose = verbose, - ...) - } else if(method == 'gini') { - - markers_result = findGiniMarkers_one_vs_all(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - subset_clusters = subset_clusters, - min_expr_gini_score = min_expr_gini_score, - min_det_gini_score = min_det_gini_score, - detection_threshold = detection_threshold, - min_feats = min_feats, - verbose = verbose) - - } else if(method == 'mast') { - - markers_result = findMastMarkers_one_vs_all(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - cluster_column = cluster_column, - subset_clusters = subset_clusters, - adjust_columns = adjust_columns, - pval = pval, - logFC = logFC, - min_feats = min_feats, - verbose = verbose, - ...) - - } - - return(markers_result) + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + cluster_column, + subset_clusters = NULL, + method = c("scran", "gini", "mast"), + # scran & mast + pval = 0.01, + logFC = 0.5, + min_feats = 10, + min_genes = NULL, + # gini + min_expr_gini_score = 0.5, + min_det_gini_score = 0.5, + detection_threshold = 0, + rank_score = 1, + # mast specific + adjust_columns = NULL, + verbose = TRUE, + ...) { + ## deprecated arguments + if (!is.null(min_genes)) { + min_feats <- min_genes + warning("min_genes argument is deprecated, use min_feats argument in + the future") + } + # select method + method <- match.arg(method, choices = c("scran", "gini", "mast")) + + if (method == "scran") { + markers_result <- findScranMarkers_one_vs_all( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + subset_clusters = subset_clusters, + pval = pval, + logFC = logFC, + min_feats = min_feats, + verbose = verbose, + ... + ) + } else if (method == "gini") { + markers_result <- findGiniMarkers_one_vs_all( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + subset_clusters = subset_clusters, + min_expr_gini_score = min_expr_gini_score, + min_det_gini_score = min_det_gini_score, + detection_threshold = detection_threshold, + min_feats = min_feats, + verbose = verbose + ) + } else if (method == "mast") { + markers_result <- findMastMarkers_one_vs_all( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + cluster_column = cluster_column, + subset_clusters = subset_clusters, + adjust_columns = adjust_columns, + pval = pval, + logFC = logFC, + min_feats = min_feats, + verbose = verbose, + ... + ) + } + return(markers_result) } - - - - - - - - - diff --git a/R/dimension_reduction.R b/R/dimension_reduction.R index 8b1ba4666..cba512af1 100644 --- a/R/dimension_reduction.R +++ b/R/dimension_reduction.R @@ -1,5 +1,3 @@ - - # * Dimension Reduction Object Creation # # ! Moved to classes.R @@ -21,96 +19,98 @@ #' @param seed_number seed number to use #' @keywords internal #' @return list of eigenvalues, loadings and pca coordinates -.run_pca_factominer = function(x, - ncp = 100, - scale = TRUE, - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - ...) { - - # verify if optional package is installed - package_check(pkg_name = "FactoMineR", repository = "CRAN") - - if(!methods::is(x, 'matrix')) { - x = as.matrix(x) - } - - if(isTRUE(rev)) { - - x = t_flex(x) - - if(ncp > nrow(x)) { - warning("ncp > nrow(x), will be set to nrow(x)") - ncp = nrow(x) - } - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) +.run_pca_factominer <- function(x, + ncp = 100, + scale = TRUE, + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + ...) { + # verify if optional package is installed + package_check(pkg_name = "FactoMineR", repository = "CRAN") + + if (!methods::is(x, "matrix")) { + x <- as.matrix(x) } - pca_res = FactoMineR::PCA(X = x, ncp = ncp, scale.unit = scale, graph = F, ...) + if (isTRUE(rev)) { + x <- t_flex(x) - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } + if (ncp > nrow(x)) { + warning("ncp > nrow(x), will be set to nrow(x)") + ncp <- nrow(x) + } - # eigenvalues - eigenvalues = pca_res$eig[,1] + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } - # PC loading - loadings = pca_res$ind$coord - rownames(loadings) = rownames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) + pca_res <- FactoMineR::PCA( + X = x, ncp = ncp, scale.unit = scale, graph = FALSE, ...) - # coordinates - coords = sweep(pca_res$var$coord, 2, sqrt(eigenvalues[1:ncp]), FUN = "/") - rownames(coords) = colnames(x) - colnames(coords) = paste0('Dim.', 1:ncol(coords)) + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) + # eigenvalues + eigenvalues <- pca_res$eig[, 1] - } else { + # PC loading + loadings <- pca_res$ind$coord + rownames(loadings) <- rownames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) - if(ncp > ncol(x)) { - warning("ncp > ncol(x), will be set to ncol(x)") - ncp = ncol(x) - } + # coordinates + coords <- sweep(pca_res$var$coord, + 2, sqrt(eigenvalues[1:ncp]), FUN = "/") + rownames(coords) <- colnames(x) + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - } + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } else { + if (ncp > ncol(x)) { + warning("ncp > ncol(x), will be set to ncol(x)") + ncp <- ncol(x) + } - pca_res = FactoMineR::PCA(X = x, ncp = ncp, scale.unit = scale, graph = F, ...) + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } + pca_res <- FactoMineR::PCA( + X = x, ncp = ncp, scale.unit = scale, graph = FALSE, ...) - # eigenvalues - eigenvalues = pca_res$eig[,1] + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - # PC loading - loadings = sweep(pca_res$var$coord, 2, sqrt(eigenvalues[1:ncp]), FUN = "/") - rownames(loadings) = colnames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) + # eigenvalues + eigenvalues <- pca_res$eig[, 1] - # coordinates - coords = pca_res$ind$coord - rownames(coords) = rownames(x) - colnames(coords) = paste0('Dim.', 1:ncol(coords)) + # PC loading + loadings <- sweep( + pca_res$var$coord, 2, sqrt(eigenvalues[1:ncp]), FUN = "/") + rownames(loadings) <- colnames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) + # coordinates + coords <- pca_res$ind$coord + rownames(coords) <- rownames(x) + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) - } + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } - vmsg(.is_debug = TRUE, "finished .run_pca_factominer, method == factominer") + vmsg(.is_debug = TRUE, + "finished .run_pca_factominer, method == factominer") - return(result) + return(result) } @@ -128,113 +128,123 @@ #' @param BPPARAM BiocParallelParam object #' @keywords internal #' @return list of eigenvalues, loadings and pca coordinates -.run_pca_biocsingular = function(x, - ncp = 100, - center = TRUE, - scale = TRUE, - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - BSPARAM = c('irlba', 'exact', 'random'), - BPPARAM = BiocParallel::SerialParam(), - ...) { - - BSPARAM = match.arg(BSPARAM, choices = c('irlba', 'exact', 'random')) - - min_ncp = min(dim(x)) - - if(ncp >= min_ncp) { - warning("ncp >= minimum dimension of x, will be set to minimum dimension of x - 1") - ncp = min_ncp-1 - } - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - } - - if(isTRUE(rev)) { - - x = t_flex(x) - - if(BSPARAM == 'irlba') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::IrlbaParam(), - BPPARAM = BPPARAM, - ...) - } else if(BSPARAM == 'exact') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::ExactParam(), - BPPARAM = BPPARAM, - ...) - } else if(BSPARAM == 'random') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::RandomParam(), - BPPARAM = BPPARAM, - ...) +.run_pca_biocsingular <- function(x, + ncp = 100, + center = TRUE, + scale = TRUE, + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + BSPARAM = c("irlba", "exact", "random"), + BPPARAM = BiocParallel::SerialParam(), + ...) { + BSPARAM <- match.arg(BSPARAM, choices = c("irlba", "exact", "random")) + + min_ncp <- min(dim(x)) + + if (ncp >= min_ncp) { + warning("ncp >= minimum dimension of x, will be set to + minimum dimension of x - 1") + ncp <- min_ncp - 1 } + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } + if (isTRUE(rev)) { + x <- t_flex(x) + + if (BSPARAM == "irlba") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::IrlbaParam(), + BPPARAM = BPPARAM, + ... + ) + } else if (BSPARAM == "exact") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::ExactParam(), + BPPARAM = BPPARAM, + ... + ) + } else if (BSPARAM == "random") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::RandomParam(), + BPPARAM = BPPARAM, + ... + ) + } - # eigenvalues - eigenvalues = pca_res$sdev^2 - # PC loading - loadings = pca_res$x - rownames(loadings) = rownames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) - # coordinates - coords = pca_res$rotation - rownames(coords) = colnames(x) - colnames(coords) = paste0('Dim.', 1:ncol(coords)) - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) - - } else { - - if(BSPARAM == 'irlba') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::IrlbaParam(), - BPPARAM = BPPARAM, - ...) - } else if(BSPARAM == 'exact') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::ExactParam(), - BPPARAM = BPPARAM, - ...) - } else if(BSPARAM == 'random') { - pca_res = BiocSingular::runPCA(x = x, rank = ncp, - center = center, scale = scale, - BSPARAM = BiocSingular::RandomParam(), - BPPARAM = BPPARAM, - ...) - } - # eigenvalues - eigenvalues = pca_res$sdev^2 - # PC loading - loadings = pca_res$rotation - rownames(loadings) = colnames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) - # coordinates - coords = pca_res$x - rownames(coords) = rownames(x) - colnames(coords) = paste0('Dim.', 1:ncol(coords)) - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) - } + # eigenvalues + eigenvalues <- pca_res$sdev^2 + # PC loading + loadings <- pca_res$x + rownames(loadings) <- rownames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) + # coordinates + coords <- pca_res$rotation + rownames(coords) <- colnames(x) + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } else { + if (BSPARAM == "irlba") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::IrlbaParam(), + BPPARAM = BPPARAM, + ... + ) + } else if (BSPARAM == "exact") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::ExactParam(), + BPPARAM = BPPARAM, + ... + ) + } else if (BSPARAM == "random") { + pca_res <- BiocSingular::runPCA( + x = x, rank = ncp, + center = center, scale = scale, + BSPARAM = BiocSingular::RandomParam(), + BPPARAM = BPPARAM, + ... + ) + } - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } + # eigenvalues + eigenvalues <- pca_res$sdev^2 + # PC loading + loadings <- pca_res$rotation + rownames(loadings) <- colnames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) + # coordinates + coords <- pca_res$x + rownames(coords) <- rownames(x) + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } - vmsg(.is_debug = TRUE, "finished .run_pca_biocsingular, method ==", BSPARAM) + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - return(result) + vmsg(.is_debug = TRUE, "finished .run_pca_biocsingular, method ==", BSPARAM) + + return(result) } @@ -253,46 +263,57 @@ #' @param verbose verbosity #' @keywords internal #' @return subsetted matrix based on selected features -.create_feats_to_use_matrix = function(gobject, - feat_type = NULL, - spat_unit = NULL, - sel_matrix, - feats_to_use, - verbose = FALSE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # cell metadata - feat_metadata = fDataDT(gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # for hvf features - if(is.character(feats_to_use) && length(feats_to_use) == 1) { - if(feats_to_use %in% colnames(feat_metadata)) { - vmsg(.v = verbose, str_double_quote(feats_to_use), - 'column was found in the feats metadata information and will be used to select highly variable features') - feats_to_use = feat_metadata[get(feats_to_use) == 'yes'][['feat_ID']] - sel_matrix = sel_matrix[rownames(sel_matrix) %in% feats_to_use, ] +.create_feats_to_use_matrix <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + sel_matrix, + feats_to_use, + verbose = FALSE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # cell metadata + feat_metadata <- fDataDT(gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # for hvf features + if (is.character(feats_to_use) && length(feats_to_use) == 1) { + if (feats_to_use %in% colnames(feat_metadata)) { + vmsg( + .v = verbose, str_double_quote(feats_to_use), + "column was found in the feats metadata information and will be + used to select highly variable features" + ) + feats_to_use <- feat_metadata[ + get(feats_to_use) == "yes"][["feat_ID"]] + sel_matrix <- sel_matrix[rownames(sel_matrix) %in% feats_to_use, ] + } else { + vmsg( + .v = verbose, str_double_quote(feats_to_use), + "was not found in the gene metadata information. + all genes will be used." + ) + } } else { - vmsg(.v = verbose, str_double_quote(feats_to_use), - 'was not found in the gene metadata information. - all genes will be used.') + vmsg(.v = verbose, + "a custom vector of genes will be used to subset the matrix") + sel_matrix <- sel_matrix[rownames(sel_matrix) %in% feats_to_use, ] } - } else { - vmsg(.v = verbose, 'a custom vector of genes will be used to subset the matrix') - sel_matrix = sel_matrix[rownames(sel_matrix) %in% feats_to_use, ] - } - vmsg(.v = verbose, .is_debug = TRUE, 'class of selected matrix: ', class(sel_matrix)) + vmsg(.v = verbose, .is_debug = TRUE, + "class of selected matrix: ", class(sel_matrix)) - return(sel_matrix) + return(sel_matrix) } @@ -328,175 +349,194 @@ #' \item feats_to_use = NULL: will use all features from the selected matrix #' \item feats_to_use = : can be used to select a column name of #' highly variable features, created by (see \code{\link{calculateHVF}}) -#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features +#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually +#' provided features #' } #' By default the number of principle components that we calculate is 100, which #' may not encompass all the variation within the dataset. Setting ncp to NULL #' will calculate all the principle components. #' @export runPCA <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - name = NULL, - feats_to_use = 'hvf', - return_gobject = TRUE, - center = TRUE, - scale_unit = TRUE, - ncp = 100, - method = c('irlba', 'exact', 'random','factominer'), - method_params = BiocParallel::SerialParam(), - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - verbose = TRUE, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # specify name to use for pca - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'pca' - } else { - name = paste0(feat_type,'.','pca') - } - } - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - - expr_values = expr_values[] # extract matrix - - # set max ncp if NULL was provided - if (is.null(ncp)) { - ncp = nrow(expr_values) - } - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - - # do PCA dimension reduction - reduction = match.arg(reduction, c('cells', 'feats')) - - # PCA implementation - method = match.arg(method, c('irlba', 'exact', 'random','factominer')) - - if(reduction == 'cells') { - # PCA on cells - if(method %in% c('irlba', 'exact', 'random')) { - pca_object = .run_pca_biocsingular(x = t_flex(expr_values), - center = center, - scale = scale_unit, - ncp = ncp, - rev = rev, - set_seed = set_seed, - seed_number = seed_number, - BSPARAM = method, - BPPARAM = method_params, - ...) - } else if(method == 'factominer') { - pca_object = .run_pca_factominer(x = t_flex(expr_values), - scale = scale_unit, - ncp = ncp, rev = rev, - set_seed = set_seed, - seed_number = seed_number, - ...) - } else { - stop('only PCA methods from the BiocSingular and factominer package have been implemented \n') - } + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + name = NULL, + feats_to_use = "hvf", + return_gobject = TRUE, + center = TRUE, + scale_unit = TRUE, + ncp = 100, + method = c("irlba", "exact", "random", "factominer"), + method_params = BiocParallel::SerialParam(), + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } else { - # PCA on genes - if(method %in% c('irlba', 'exact', 'random')) { - pca_object = .run_pca_biocsingular(x = expr_values, - center = center, - scale = scale_unit, - ncp = ncp, - rev = rev, - set_seed = set_seed, - seed_number = seed_number, - BSPARAM = method, - BPPARAM = method_params, - ...) - - } else if(method == 'factominer') { - pca_object = .run_pca_factominer(x = expr_values, - scale = scale_unit, ncp = ncp, rev = rev, - set_seed = set_seed, seed_number = seed_number, ...) - } else { - stop('only PCA methods from the irlba and factominer package have been implemented \n') + # specify name to use for pca + if (is.null(name)) { + if (feat_type == "rna") { + name <- "pca" + } else { + name <- paste0(feat_type, ".", "pca") + } } - } + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + values = values, + output = "exprObj" + ) + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + # set max ncp if NULL was provided + if (is.null(ncp)) { + ncp <- nrow(expr_values) + } - if(isTRUE(return_gobject)) { + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } - pca_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'pca') - if(name %in% pca_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } + # do PCA dimension reduction + reduction <- match.arg(reduction, c("cells", "feats")) + + # PCA implementation + method <- match.arg(method, c("irlba", "exact", "random", "factominer")) if (reduction == "cells") { - my_row_names = colnames(expr_values) + # PCA on cells + if (method %in% c("irlba", "exact", "random")) { + pca_object <- .run_pca_biocsingular( + x = t_flex(expr_values), + center = center, + scale = scale_unit, + ncp = ncp, + rev = rev, + set_seed = set_seed, + seed_number = seed_number, + BSPARAM = method, + BPPARAM = method_params, + ... + ) + } else if (method == "factominer") { + pca_object <- .run_pca_factominer( + x = t_flex(expr_values), + scale = scale_unit, + ncp = ncp, rev = rev, + set_seed = set_seed, + seed_number = seed_number, + ... + ) + } else { + stop("only PCA methods from the BiocSingular and factominer + package have been implemented") + } } else { - my_row_names = rownames(expr_values) + # PCA on genes + if (method %in% c("irlba", "exact", "random")) { + pca_object <- .run_pca_biocsingular( + x = expr_values, + center = center, + scale = scale_unit, + ncp = ncp, + rev = rev, + set_seed = set_seed, + seed_number = seed_number, + BSPARAM = method, + BPPARAM = method_params, + ... + ) + } else if (method == "factominer") { + pca_object <- .run_pca_factominer( + x = expr_values, + scale = scale_unit, ncp = ncp, rev = rev, + set_seed = set_seed, seed_number = seed_number, ... + ) + } else { + stop("only PCA methods from the irlba and factominer package have + been implemented") + } } - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - provenance = provenance, - reduction = reduction, - reduction_method = 'pca', - coordinates = pca_object$coords, - misc = list(eigenvalues = pca_object$eigenvalues, - loadings = pca_object$loadings), - my_rownames = my_row_names) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_pca') - return(gobject) + if (isTRUE(return_gobject)) { + pca_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "pca" + ) + + if (name %in% pca_names) { + cat(name, " has already been used, will be overwritten") + } + if (reduction == "cells") { + my_row_names <- colnames(expr_values) + } else { + my_row_names <- rownames(expr_values) + } - } else { - return(pca_object) - } + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + provenance = provenance, + reduction = reduction, + reduction_method = "pca", + coordinates = pca_object$coords, + misc = list( + eigenvalues = pca_object$eigenvalues, + loadings = pca_object$loadings + ), + my_rownames = my_row_names + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_pca") + return(gobject) + } else { + return(pca_object) + } } @@ -526,146 +566,145 @@ runPCA <- function(gobject, #' @param verbose verbosity level #' @keywords internal #' @return list of eigenvalues, loadings and pca coordinates -.run_pca_biocsingular_irlba_projection = function(x, - ncp = 100, - center = TRUE, - scale = TRUE, - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - BPPARAM = BiocParallel::SerialParam(), - random_subset = 500, - verbose = TRUE, - ...) { - - - x = scale(x, center = center, scale = scale) - - min_ncp = min(dim(x)) - - if(ncp >= min_ncp) { - warning("ncp >= minimum dimension of x, will be set to minimum dimension of x - 1") - ncp = min_ncp-1 - } - - # seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - } - on.exit(random_seed(), add = TRUE) - - - - - if(isTRUE(rev)) { - - x = t_flex(x) +.run_pca_biocsingular_irlba_projection <- function(x, + ncp = 100, + center = TRUE, + scale = TRUE, + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + BPPARAM = BiocParallel::SerialParam(), + random_subset = 500, + verbose = TRUE, + ...) { + x <- scale(x, center = center, scale = scale) + + min_ncp <- min(dim(x)) + + if (ncp >= min_ncp) { + warning("ncp >= minimum dimension of x, will be set to minimum + dimension of x - 1") + ncp <- min_ncp - 1 + } - # store cell ID order information - cell_ID_order = rownames(x) + # seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } + on.exit(random_seed(), add = TRUE) - # create random selection - random_selection = sort(sample(1:nrow(x), random_subset)) - subsample_matrix = x[random_selection, ] - if(verbose) wrap_msg('pca random subset: start') - # pca on random selection - pca_res = BiocSingular::runPCA(x = subsample_matrix, - rank = ncp, - center = F, scale = F, - BSPARAM = BiocSingular::IrlbaParam(), - BPPARAM = BPPARAM, - ...) + if (isTRUE(rev)) { + x <- t_flex(x) - if(verbose) wrap_msg('pca random subset: done') - if(verbose) wrap_msg('pca prediction: start') + # store cell ID order information + cell_ID_order <- rownames(x) - # create leftover matrix - leftover_matrix = x[-random_selection, ] + # create random selection + random_selection <- sort(sample(1:nrow(x), random_subset)) + subsample_matrix <- x[random_selection, ] - # predict on leftover matrix - class(pca_res) = 'prcomp' - pca_res$center = FALSE - pca_res$scale = FALSE - project_results = predict(pca_res, leftover_matrix) - if(verbose) wrap_msg('pca prediction: done') + if (verbose) message("pca random subset: start") - # combine subsample + predicted coordinates - coords = rbind(pca_res$x, project_results) - coords = coords[match(cell_ID_order, rownames(coords)), ] + # pca on random selection + pca_res <- BiocSingular::runPCA( + x = subsample_matrix, + rank = ncp, + center = FALSE, scale = FALSE, + BSPARAM = BiocSingular::IrlbaParam(), + BPPARAM = BPPARAM, + ... + ) - # eigenvalues - eigenvalues = pca_res$sdev^2 + if (verbose) message("pca random subset: done") + if (verbose) message("pca prediction: start") - # PC loading - loadings = coords - rownames(loadings) = rownames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) + # create leftover matrix + leftover_matrix <- x[-random_selection, ] - # coordinates - coords = pca_res$rotation - rownames(coords) = colnames(x) - colnames(coords) = paste0('Dim.', 1:ncol(coords)) + # predict on leftover matrix + class(pca_res) <- "prcomp" + pca_res$center <- FALSE + pca_res$scale <- FALSE + project_results <- predict(pca_res, leftover_matrix) - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) + if (verbose) message("pca prediction: done") + # combine subsample + predicted coordinates + coords <- rbind(pca_res$x, project_results) + coords <- coords[match(cell_ID_order, rownames(coords)), ] - } else { + # eigenvalues + eigenvalues <- pca_res$sdev^2 - # store cell ID order information - cell_ID_order = rownames(x) + # PC loading + loadings <- coords + rownames(loadings) <- rownames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) - # create random selection - random_selection = sort(sample(1:nrow(x), random_subset)) - subsample_matrix = x[random_selection, ] + # coordinates + coords <- pca_res$rotation + rownames(coords) <- colnames(x) + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) - if(verbose) wrap_msg('pca random subset: start') + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } else { + # store cell ID order information + cell_ID_order <- rownames(x) - pca_res = BiocSingular::runPCA(x = subsample_matrix, - rank = ncp, - center = F, scale = F, - BSPARAM = BiocSingular::IrlbaParam(), - BPPARAM = BPPARAM, - ...) + # create random selection + random_selection <- sort(sample(1:nrow(x), random_subset)) + subsample_matrix <- x[random_selection, ] - if(verbose) wrap_msg('pca random subset: done') - if(verbose) wrap_msg('pca prediction: start') + if (verbose) message("pca random subset: start") - # create leftover matrix - leftover_matrix = x[-random_selection, ] + pca_res <- BiocSingular::runPCA( + x = subsample_matrix, + rank = ncp, + center = FALSE, scale = FALSE, + BSPARAM = BiocSingular::IrlbaParam(), + BPPARAM = BPPARAM, + ... + ) - # predict on leftover matrix - class(pca_res) = 'prcomp' - pca_res$center = FALSE - pca_res$scale = FALSE - project_results = predict(pca_res, leftover_matrix) + if (verbose) message("pca random subset: done") + if (verbose) message("pca prediction: start") - if(verbose) wrap_msg('pca prediction: done') + # create leftover matrix + leftover_matrix <- x[-random_selection, ] - # combine subsample + predicted coordinates - coords = rbind(pca_res$x, project_results) - coords = coords[match(cell_ID_order, rownames(coords)), ] + # predict on leftover matrix + class(pca_res) <- "prcomp" + pca_res$center <- FALSE + pca_res$scale <- FALSE + project_results <- predict(pca_res, leftover_matrix) - # eigenvalues - eigenvalues = pca_res$sdev^2 + if (verbose) message("pca prediction: done") - # PC loading - loadings = pca_res$rotation - rownames(loadings) = colnames(x) - colnames(loadings) = paste0('Dim.', 1:ncol(loadings)) + # combine subsample + predicted coordinates + coords <- rbind(pca_res$x, project_results) + coords <- coords[match(cell_ID_order, rownames(coords)), ] - # coordinates - colnames(coords) = paste0('Dim.', 1:ncol(coords)) + # eigenvalues + eigenvalues <- pca_res$sdev^2 - result = list(eigenvalues = eigenvalues, loadings = loadings, coords = coords) + # PC loading + loadings <- pca_res$rotation + rownames(loadings) <- colnames(x) + colnames(loadings) <- paste0("Dim.", 1:ncol(loadings)) - } + # coordinates + colnames(coords) <- paste0("Dim.", 1:ncol(coords)) - return(result) + result <- list( + eigenvalues = eigenvalues, loadings = loadings, coords = coords) + } + return(result) } @@ -678,7 +717,8 @@ runPCA <- function(gobject, #' @title runPCAprojection #' @name runPCAprojection -#' @description runs a Principal Component Analysis on a random subet + projection +#' @description runs a Principal Component Analysis on a random +#' subset + projection #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -699,7 +739,8 @@ runPCA <- function(gobject, #' @param verbose verbosity of the function #' @param ... additional parameters for PCA (see details) #' @return giotto object with updated PCA dimension recuction -#' @details See \code{\link[BiocSingular]{runPCA}} and \code{\link[FactoMineR]{PCA}} for more information about other parameters. +#' @details See \code{\link[BiocSingular]{runPCA}} and +#' \code{\link[FactoMineR]{PCA}} for more information about other parameters. #' This PCA implementation is similar to \code{\link{runPCA}}, except that it #' performs PCA on a subset of the cells or features, and predict on the others. #' This can significantly increase speed without sacrificing accuracy too much. @@ -707,176 +748,188 @@ runPCA <- function(gobject, #' \item feats_to_use = NULL: will use all features from the selected matrix #' \item feats_to_use = : can be used to select a column name of #' highly variable features, created by (see \code{\link{calculateHVF}}) -#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features +#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually +#' provided features #' } #' @export -runPCAprojection = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - random_subset = 500, - name = 'pca.projection', - feats_to_use = 'hvf', - return_gobject = TRUE, - center = TRUE, - scale_unit = TRUE, - ncp = 100, - method = c('irlba'), - method_params = BiocParallel::SerialParam(), - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - verbose = TRUE, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # specify name to use for pca - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'pca' - } else { - name = paste0(feat_type,'.','pca') - } - } - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - - if(!is.null(slot(gobject, 'h5_file'))) { - expr_path = slot(expr_values, 'exprMat') - - expr_values = HDF5Array::h5mread(filepath = slot(gobject, 'h5_file'), - name = paste0('expression/', - feat_type,'/', - values)) - - expr_dimnames = HDF5Array::h5readDimnames(filepath = slot(gobject, 'h5_file'), - name = paste0('expression/', - feat_type,'/', - values)) - - rownames(expr_values) = expr_dimnames[[1]] - colnames(expr_values) = expr_dimnames[[2]] - - } else { - expr_values = expr_values[] # extract matrix - } - - - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - - # do PCA dimension reduction - reduction = match.arg(reduction, c('cells', 'feats')) - - # PCA implementation - method = match.arg(method, c('irlba')) - - if(reduction == 'cells') { - - # PCA on cells - pca_object = .run_pca_biocsingular_irlba_projection( - x = t_flex(expr_values), - ncp = ncp, - center = center, - scale = scale_unit, - rev = rev, - set_seed = set_seed, - seed_number = seed_number, - BPPARAM = method_params, - random_subset = random_subset, - verbose = verbose, - ... +runPCAprojection <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + random_subset = 500, + name = "pca.projection", + feats_to_use = "hvf", + return_gobject = TRUE, + center = TRUE, + scale_unit = TRUE, + ncp = 100, + method = c("irlba"), + method_params = BiocParallel::SerialParam(), + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit ) - - } else { - - # PCA on features - pca_object = .run_pca_biocsingular_irlba_projection( - x = expr_values, - ncp = ncp, - center = center, - scale = scale_unit, - rev = rev, - set_seed = set_seed, - seed_number = seed_number, - BPPARAM = method_params, - random_subset = random_subset, - verbose = verbose, - ... + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - } - - if(isTRUE(return_gobject)) { - - pca_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'pca') - - if(name %in% pca_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') + # specify name to use for pca + if (is.null(name)) { + if (feat_type == "rna") { + name <- "pca" + } else { + name <- paste0(feat_type, ".", "pca") + } } - if (reduction == "cells") { - my_row_names = colnames(expr_values) + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + + if (!is.null(slot(gobject, "h5_file"))) { + expr_path <- slot(expr_values, "exprMat") + + expr_values <- HDF5Array::h5mread( + filepath = slot(gobject, "h5_file"), + name = paste0( + "expression/", + feat_type, "/", + values + ) + ) + + expr_dimnames <- HDF5Array::h5readDimnames( + filepath = slot(gobject, "h5_file"), + name = paste0( + "expression/", + feat_type, "/", + values + ) + ) + + rownames(expr_values) <- expr_dimnames[[1]] + colnames(expr_values) <- expr_dimnames[[2]] } else { - my_row_names = rownames(expr_values) + expr_values <- expr_values[] # extract matrix } - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - provenance = provenance, - reduction = reduction, - reduction_method = 'pca', - coordinates = pca_object$coords, - misc = list(eigenvalues = pca_object$eigenvalues, - loadings = pca_object$loadings), - my_rownames = my_row_names) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_pca') - return(gobject) + # do PCA dimension reduction + reduction <- match.arg(reduction, c("cells", "feats")) + # PCA implementation + method <- match.arg(method, c("irlba")) - } else { - return(pca_object) - } + if (reduction == "cells") { + # PCA on cells + pca_object <- .run_pca_biocsingular_irlba_projection( + x = t_flex(expr_values), + ncp = ncp, + center = center, + scale = scale_unit, + rev = rev, + set_seed = set_seed, + seed_number = seed_number, + BPPARAM = method_params, + random_subset = random_subset, + verbose = verbose, + ... + ) + } else { + # PCA on features + pca_object <- .run_pca_biocsingular_irlba_projection( + x = expr_values, + ncp = ncp, + center = center, + scale = scale_unit, + rev = rev, + set_seed = set_seed, + seed_number = seed_number, + BPPARAM = method_params, + random_subset = random_subset, + verbose = verbose, + ... + ) + } + if (isTRUE(return_gobject)) { + pca_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "pca" + ) + + if (name %in% pca_names) { + cat(name, " has already been used, will be overwritten") + } + + if (reduction == "cells") { + my_row_names <- colnames(expr_values) + } else { + my_row_names <- rownames(expr_values) + } + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + provenance = provenance, + reduction = reduction, + reduction_method = "pca", + coordinates = pca_object$coords, + misc = list( + eigenvalues = pca_object$eigenvalues, + loadings = pca_object$loadings + ), + my_rownames = my_row_names + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_pca") + return(gobject) + } else { + return(pca_object) + } } @@ -885,7 +938,8 @@ runPCAprojection = function(gobject, #' @title runPCAprojectionBatch #' @name runPCAprojectionBatch -#' @description runs a Principal Component Analysis on multiple random batches + projection +#' @description runs a Principal Component Analysis on multiple random +#' batches + projection #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -906,314 +960,346 @@ runPCAprojection = function(gobject, #' @param seed_number seed number to use #' @param verbose verbosity of the function #' @param ... additional parameters for PCA (see details) -#' @return giotto object with updated PCA dimension recuction -#' @details See \code{\link[BiocSingular]{runPCA}} and \code{\link[FactoMineR]{PCA}} for more information about other parameters. -#' This PCA implementation is similar to \code{\link{runPCA}} and \code{\link{runPCAprojection}}, -#' except that it performs PCA on multiple subsets (batches) of the cells or features, -#' and predict on the others. This can significantly increase speed without sacrificing accuracy too much. +#' @return giotto object with updated PCA dimension reduction +#' @details See \code{\link[BiocSingular]{runPCA}} and +#' \code{\link[FactoMineR]{PCA}} for more information about other parameters. +#' This PCA implementation is similar to \code{\link{runPCA}} and +#' \code{\link{runPCAprojection}}, +#' except that it performs PCA on multiple subsets (batches) of the cells or +#' features, +#' and predict on the others. This can significantly increase speed without +#' sacrificing accuracy too much. #' \itemize{ #' \item feats_to_use = NULL: will use all features from the selected matrix #' \item feats_to_use = : can be used to select a column name of #' highly variable features, created by (see \code{\link{calculateHVF}}) -#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features +#' \item feats_to_use = c('geneA', 'geneB', ...): will use all manually +#' provided features #' } #' @export -runPCAprojectionBatch = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - random_subset = 500, - batch_number = 5, - name = 'pca.projection.batch', - feats_to_use = 'hvf', - return_gobject = TRUE, - center = TRUE, - scale_unit = TRUE, - ncp = 100, - method = c('irlba'), - method_params = BiocParallel::SerialParam(), - rev = FALSE, - set_seed = TRUE, - seed_number = 1234, - verbose = TRUE, - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # specify name to use for pca - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'pca' - } else { - name = paste0(feat_type,'.','pca') - } - } - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - - if(!is.null(slot(gobject, 'h5_file'))) { - expr_path = slot(expr_values, 'exprMat') - - expr_values = HDF5Array::h5mread(filepath = slot(gobject, 'h5_file'), - name = paste0('expression/', - feat_type,'/', - values)) - - expr_dimnames = HDF5Array::h5readDimnames(filepath = slot(gobject, 'h5_file'), - name = paste0('expression/', - feat_type,'/', - values)) - - rownames(expr_values) = expr_dimnames[[1]] - colnames(expr_values) = expr_dimnames[[2]] - - } else { - expr_values = expr_values[] # extract matrix - } - - - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - - # do PCA dimension reduction - reduction = match.arg(reduction, c('cells', 'feats')) - - # PCA implementation - method = match.arg(method, c('irlba')) - - if(reduction == 'cells') { - - pca_batch_results = list() - - for(batch in 1:batch_number) { - - if(verbose) wrap_msg('start batch ', batch) - - if(isTRUE(set_seed)) { - seed_batch = seed_number+batch - } else { - seed_batch = NULL - } - +runPCAprojectionBatch <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + random_subset = 500, + batch_number = 5, + name = "pca.projection.batch", + feats_to_use = "hvf", + return_gobject = TRUE, + center = TRUE, + scale_unit = TRUE, + ncp = 100, + method = c("irlba"), + method_params = BiocParallel::SerialParam(), + rev = FALSE, + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # PCA on cells - pca_object = .run_pca_biocsingular_irlba_projection( - x = t_flex(expr_values), - ncp = ncp, - center = center, - scale = scale_unit, - rev = rev, - set_seed = set_seed, - seed_number = seed_batch, - BPPARAM = method_params, - random_subset = random_subset, - verbose = verbose, - ... - ) - - # adjust the sign of the coordinates and loadings vector relative to first batch - # this is necessary for the next averaging step - if(batch == 1) { - pca_batch_results[[batch]] = pca_object - } else { - - for(dimension in 1:ncol(pca_object[['coords']])) { - sum_evaluation = sum(sign(pca_batch_results[[1]][['coords']][1:20, dimension]) * - sign(pca_object[['coords']][1:20, dimension])) - if(sum_evaluation < 0) { - pca_object$coords[, dimension] = -1 * pca_object$coords[, dimension] - pca_object$loadings[, dimension] = -1 * pca_object$loadings[, dimension] - } + # specify name to use for pca + if (is.null(name)) { + if (feat_type == "rna") { + name <- "pca" + } else { + name <- paste0(feat_type, ".", "pca") } - pca_batch_results[[batch]] = pca_object - } } - if(verbose) wrap_msg('start averaging pca results of batches') - - # calculate average eigenvalues, coordinates and loadings - # TODO: test out DT approach, might be faster and more efficient for - # large datasets - - # eigenvalues - eigenvalues_list = lapply(pca_batch_results, FUN = function(x) x$eigenvalues) - eigenvalues_matrix = do.call('cbind', eigenvalues_list) - eigenvalues_mean = rowMeans_flex(eigenvalues_matrix) - - # coordinates - coords_list = lapply(pca_batch_results, FUN = function(x) x$coords) - coords_vector = do.call('c', coords_list) - coords_array = array(data = coords_vector, dim = c(ncol(expr_values), ncp, length(pca_batch_results))) - coords_all = apply(coords_array, MARGIN = c(1:2), function(arr){mean(arr, na.rm=TRUE)}) - rownames(coords_all) = rownames(pca_batch_results[[1]][['coords']]) - colnames(coords_all) = colnames(pca_batch_results[[1]][['coords']]) - - # loadings - loadings_list = lapply(pca_batch_results, FUN = function(x) x$loadings) - loadings_vector = do.call('c', loadings_list) - loadings_array = array(data = loadings_vector, dim = c(nrow(expr_values), ncp, length(pca_batch_results))) - loadings_all = apply(loadings_array, MARGIN = c(1:2), function(arr){mean(arr, na.rm=TRUE)}) - rownames(loadings_all) = rownames(pca_batch_results[[1]][['loadings']]) - colnames(loadings_all) = colnames(pca_batch_results[[1]][['loadings']]) - - - pca_object = list(eigenvalues = eigenvalues_mean, loadings = loadings_all, coords = coords_all) - - - } else { - - - pca_batch_results = list() - - for(batch in 1:batch_number) { - - if(verbose) wrap_msg('start batch ', batch) - - if(isTRUE(set_seed)) { - seed_batch = seed_number+batch - } else { - seed_batch = NULL - } - + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + values = values, + output = "exprObj" + ) - # PCA on features - pca_object = .run_pca_biocsingular_irlba_projection( - x = expr_values, - ncp = ncp, - center = center, - scale = scale_unit, - rev = rev, - set_seed = set_seed, - seed_number = seed_number, - BPPARAM = method_params, - random_subset = random_subset, - verbose = verbose, - ... - ) - - - # adjust the sign of the coordinates and loadings vector relative to first batch - # this is necessary for the next averaging step - if(batch == 1) { - pca_batch_results[[batch]] = pca_object - } else { - - for(dimension in 1:ncol(pca_object[['coords']])) { - sum_evaluation = sum(sign(pca_batch_results[[1]][['coords']][1:20, dimension]) * - sign(pca_object[['coords']][1:20, dimension])) - if(sum_evaluation < 0) { - pca_object$coords[, dimension] = -1 * pca_object$coords[, dimension] - pca_object$loadings[, dimension] = -1 * pca_object$loadings[, dimension] - } - } - pca_batch_results[[batch]] = pca_object - } + provenance <- prov(expr_values) + + if (!is.null(slot(gobject, "h5_file"))) { + expr_path <- slot(expr_values, "exprMat") + + expr_values <- HDF5Array::h5mread( + filepath = slot(gobject, "h5_file"), + name = paste0( + "expression/", + feat_type, "/", + values + ) + ) + + expr_dimnames <- HDF5Array::h5readDimnames( + filepath = slot(gobject, "h5_file"), + name = paste0( + "expression/", + feat_type, "/", + values + ) + ) + + rownames(expr_values) <- expr_dimnames[[1]] + colnames(expr_values) <- expr_dimnames[[2]] + } else { + expr_values <- expr_values[] # extract matrix } - if(verbose) wrap_msg('start averaging pca results of batches') - # calculate average eigenvalues, coordinates and loadings - # TODO: test out DT approach, might be faster and more efficient for - # large datasets - # eigenvalues - eigenvalues_list = lapply(pca_batch_results, FUN = function(x) x$eigenvalues) - eigenvalues_matrix = do.call('cbind', eigenvalues_list) - eigenvalues_mean = rowMeans_flex(eigenvalues_matrix) - - # coordinates - coords_list = lapply(pca_batch_results, FUN = function(x) x$coords) - coords_vector = do.call('c', coords_list) - coords_array = array(data = coords_vector, dim = c(ncol(expr_values), ncp, length(pca_batch_results))) - coords_all = apply(coords_array, MARGIN = c(1:2), function(arr){mean(arr, na.rm=TRUE)}) - rownames(coords_all) = rownames(pca_batch_results[[1]][['coords']]) - colnames(coords_all) = colnames(pca_batch_results[[1]][['coords']]) + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } - # loadings - loadings_list = lapply(pca_batch_results, FUN = function(x) x$loadings) - loadings_vector = do.call('c', loadings_list) - loadings_array = array(data = loadings_vector, dim = c(nrow(expr_values), ncp, length(pca_batch_results))) - loadings_all = apply(loadings_array, MARGIN = c(1:2), function(arr){mean(arr, na.rm=TRUE)}) - rownames(loadings_all) = rownames(pca_batch_results[[1]][['loadings']]) - colnames(loadings_all) = colnames(pca_batch_results[[1]][['loadings']]) + # do PCA dimension reduction + reduction <- match.arg(reduction, c("cells", "feats")) - pca_object = list(eigenvalues = eigenvalues_mean, loadings = loadings_all, coords = coords_all) + # PCA implementation + method <- match.arg(method, c("irlba")) + if (reduction == "cells") { + pca_batch_results <- list() + + for (batch in 1:batch_number) { + if (verbose) wrap_msg("start batch ", batch) + + if (isTRUE(set_seed)) { + seed_batch <- seed_number + batch + } else { + seed_batch <- NULL + } + + + # PCA on cells + pca_object <- .run_pca_biocsingular_irlba_projection( + x = t_flex(expr_values), + ncp = ncp, + center = center, + scale = scale_unit, + rev = rev, + set_seed = set_seed, + seed_number = seed_batch, + BPPARAM = method_params, + random_subset = random_subset, + verbose = verbose, + ... + ) + + # adjust the sign of the coordinates and loadings vector relative + # to first batch this is necessary for the next averaging step + if (batch == 1) { + pca_batch_results[[batch]] <- pca_object + } else { + for (dimension in 1:ncol(pca_object[["coords"]])) { + sum_evaluation <- sum(sign(pca_batch_results[[1]][[ + "coords"]][1:20, dimension]) * + sign(pca_object[["coords"]][1:20, dimension])) + if (sum_evaluation < 0) { + pca_object$coords[, dimension] <- -1 * pca_object$coords[, dimension] + pca_object$loadings[, dimension] <- -1 * pca_object$loadings[, dimension] + } + } + pca_batch_results[[batch]] <- pca_object + } + } + if (verbose) message("start averaging pca results of batches") + + # calculate average eigenvalues, coordinates and loadings + # TODO: test out DT approach, might be faster and more efficient for + # large datasets + + # eigenvalues + eigenvalues_list <- lapply( + pca_batch_results, FUN = function(x) x$eigenvalues) + eigenvalues_matrix <- do.call("cbind", eigenvalues_list) + eigenvalues_mean <- rowMeans_flex(eigenvalues_matrix) + + # coordinates + coords_list <- lapply(pca_batch_results, FUN = function(x) x$coords) + coords_vector <- do.call("c", coords_list) + coords_array <- array( + data = coords_vector, + dim = c(ncol(expr_values), ncp, length(pca_batch_results))) + coords_all <- apply(coords_array, MARGIN = c(1:2), function(arr) { + mean(arr, na.rm = TRUE) + }) + rownames(coords_all) <- rownames(pca_batch_results[[1]][["coords"]]) + colnames(coords_all) <- colnames(pca_batch_results[[1]][["coords"]]) + + # loadings + loadings_list <- lapply(pca_batch_results, FUN = function(x) x$loadings) + loadings_vector <- do.call("c", loadings_list) + loadings_array <- array( + data = loadings_vector, + dim = c(nrow(expr_values), ncp, length(pca_batch_results))) + loadings_all <- apply(loadings_array, MARGIN = c(1:2), function(arr) { + mean(arr, na.rm = TRUE) + }) + rownames(loadings_all) <- rownames(pca_batch_results[[1]][["loadings"]]) + colnames(loadings_all) <- colnames(pca_batch_results[[1]][["loadings"]]) + + + pca_object <- list( + eigenvalues = eigenvalues_mean, + loadings = loadings_all, coords = coords_all) + } else { + pca_batch_results <- list() + + for (batch in 1:batch_number) { + if (verbose) wrap_msg("start batch ", batch) + + if (isTRUE(set_seed)) { + seed_batch <- seed_number + batch + } else { + seed_batch <- NULL + } + + + # PCA on features + pca_object <- .run_pca_biocsingular_irlba_projection( + x = expr_values, + ncp = ncp, + center = center, + scale = scale_unit, + rev = rev, + set_seed = set_seed, + seed_number = seed_number, + BPPARAM = method_params, + random_subset = random_subset, + verbose = verbose, + ... + ) + + + # adjust the sign of the coordinates and loadings vector relative + # to first batch this is necessary for the next averaging step + if (batch == 1) { + pca_batch_results[[batch]] <- pca_object + } else { + for (dimension in 1:ncol(pca_object[["coords"]])) { + sum_evaluation <- sum(sign(pca_batch_results[[1]][[ + "coords"]][1:20, dimension]) * + sign(pca_object[["coords"]][1:20, dimension])) + if (sum_evaluation < 0) { + pca_object$coords[, dimension] <- -1 * pca_object$coords[, dimension] + pca_object$loadings[, dimension] <- -1 * pca_object$loadings[, dimension] + } + } + pca_batch_results[[batch]] <- pca_object + } + } - } + if (verbose) wrap_msg("start averaging pca results of batches") + + # calculate average eigenvalues, coordinates and loadings + # TODO: test out DT approach, might be faster and more efficient for + # large datasets + + # eigenvalues + eigenvalues_list <- lapply( + pca_batch_results, FUN = function(x) x$eigenvalues) + eigenvalues_matrix <- do.call("cbind", eigenvalues_list) + eigenvalues_mean <- rowMeans_flex(eigenvalues_matrix) + + # coordinates + coords_list <- lapply(pca_batch_results, FUN = function(x) x$coords) + coords_vector <- do.call("c", coords_list) + coords_array <- array( + data = coords_vector, + dim = c(ncol(expr_values), ncp, length(pca_batch_results))) + coords_all <- apply(coords_array, MARGIN = c(1:2), function(arr) { + mean(arr, na.rm = TRUE) + }) + rownames(coords_all) <- rownames(pca_batch_results[[1]][["coords"]]) + colnames(coords_all) <- colnames(pca_batch_results[[1]][["coords"]]) + + # loadings + loadings_list <- lapply(pca_batch_results, FUN = function(x) x$loadings) + loadings_vector <- do.call("c", loadings_list) + loadings_array <- array( + data = loadings_vector, + dim = c(nrow(expr_values), ncp, length(pca_batch_results))) + loadings_all <- apply(loadings_array, MARGIN = c(1:2), function(arr) { + mean(arr, na.rm = TRUE) + }) + rownames(loadings_all) <- rownames(pca_batch_results[[1]][["loadings"]]) + colnames(loadings_all) <- colnames(pca_batch_results[[1]][["loadings"]]) + + + pca_object <- list( + eigenvalues = eigenvalues_mean, + loadings = loadings_all, coords = coords_all) + } - if(return_gobject == TRUE) { + if (return_gobject == TRUE) { + pca_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "pca" + ) - pca_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'pca') + if (name %in% pca_names) { + cat(name, " has already been used, will be overwritten") + } - if(name %in% pca_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } + if (reduction == "cells") { + my_row_names <- colnames(expr_values) + } else { + my_row_names <- rownames(expr_values) + } - if (reduction == "cells") { - my_row_names = colnames(expr_values) + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + provenance = provenance, + reduction = reduction, + reduction_method = "pca", + coordinates = pca_object$coords, + misc = list( + eigenvalues = pca_object$eigenvalues, + loadings = pca_object$loadings + ), + my_rownames = my_row_names + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_pca") + return(gobject) } else { - my_row_names = rownames(expr_values) + return(pca_object) } - - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - provenance = provenance, - reduction = reduction, - reduction_method = 'pca', - coordinates = pca_object$coords, - misc = list(eigenvalues = pca_object$eigenvalues, - loadings = pca_object$loadings), - my_rownames = my_row_names) - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_pca') - return(gobject) - - - - } else { - return(pca_object) - } - - } @@ -1226,7 +1312,8 @@ runPCAprojectionBatch = function(gobject, #' @title screePlot #' @name screePlot -#' @description identify significant principal components (PCs) using an screeplot (a.k.a. elbowplot) +#' @description identify significant principal components (PCs) using an +#' screeplot (a.k.a. elbowplot) #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @inheritParams create_screeplot @@ -1243,139 +1330,162 @@ runPCAprojectionBatch = function(gobject, #' @return ggplot object for scree method #' @details #' Screeplot works by plotting the explained variance of each -#' individual PC in a barplot allowing you to identify which PC provides a significant -#' contribution (a.k.a 'elbow method'). \cr -#' Screeplot will use an available pca object, based on the parameter 'name', or it will -#' create it if it's not available (see \code{\link{runPCA}}) +#' individual PC in a barplot allowing you to identify which PC provides a +#' significant contribution (a.k.a 'elbow method'). \cr +#' Screeplot will use an available pca object, based on the parameter 'name', +#' or it will create it if it's not available (see \code{\link{runPCA}}) #' @export -screePlot = function(gobject, - spat_unit = NULL, - feat_type = NULL, - name = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - method = c('irlba', 'exact', 'random','factominer'), - rev = FALSE, - feats_to_use = NULL, - center = FALSE, - scale_unit = FALSE, - ncp = 100, - ylim = c(0, 20), - verbose = TRUE, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'screePlot', - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # specify name to use for screeplot - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'pca' - } else { - name = paste0(feat_type,'.','pca') - } - } - - # select direction of reduction - reduction = match.arg(reduction, c('cells', 'feats')) - pca_obj = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, - reduction_method = 'pca', - name = name, - output = 'dimObj') - - # if pca already exists plot - if(!is.null(pca_obj)) { - if(isTRUE(verbose)) wrap_msg('PCA with name: ', name, ' already exists and will be used for the screeplot \n') - - screeplot = create_screeplot(eigs = slot(pca_obj, 'misc')$eigenvalues, ncp = ncp, ylim = ylim) - - } else { - # if pca doesn't exists, then create pca and then plot - if(isTRUE(verbose)) wrap_msg('PCA with name: ', name, ' does NOT exist, PCA will be done first \n') +screePlot <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + name = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + method = c("irlba", "exact", "random", "factominer"), + rev = FALSE, + feats_to_use = NULL, + center = FALSE, + scale_unit = FALSE, + ncp = 100, + ylim = c(0, 20), + verbose = TRUE, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "screePlot", + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') + # specify name to use for screeplot + if (is.null(name)) { + if (feat_type == "rna") { + name <- "pca" + } else { + name <- paste0(feat_type, ".", "pca") + } + } - provenance = prov(expr_values) - expr_values = expr_values[] # extract matrix + # select direction of reduction + reduction <- match.arg(reduction, c("cells", "feats")) + pca_obj <- getDimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, + reduction_method = "pca", + name = name, + output = "dimObj" + ) - # PCA implementation - biocsingular_methods = c('irlba', 'exact', 'random','factominer') - method = match.arg(method, choices = biocsingular_methods) + # if pca already exists plot + if (!is.null(pca_obj)) { + if (isTRUE(verbose)) + wrap_msg("PCA with name: ", name, + " already exists and will be used for the screeplot") - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } + screeplot <- create_screeplot( + eigs = slot(pca_obj, "misc")$eigenvalues, ncp = ncp, ylim = ylim) + } else { + # if pca doesn't exists, then create pca and then plot + if (isTRUE(verbose)) + wrap_msg("PCA with name: ", name, + " does NOT exist, PCA will be done first") + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + + # PCA implementation + biocsingular_methods <- c("irlba", "exact", "random", "factominer") + method <- match.arg(method, choices = biocsingular_methods) + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } - # reduction of cells - if(reduction == 'cells') { - - # PCA on cells - if(method %in% biocsingular_methods) { - pca_object = .run_pca_biocsingular(x = t_flex(expr_values), - center = center, - scale = scale_unit, - ncp = ncp, - rev = rev, - BSPARAM = method, - BPPARAM = BiocParallel::SerialParam(), - ...) - } else if(method == 'factominer') { - pca_object = .run_pca_factominer(x = t_flex(expr_values), scale = scale_unit, ncp = ncp, rev = rev, ...) - } else { - stop('only PCA methods from the irlba and factominer package have been implemented \n') - } - - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - provenance = provenance, - reduction = reduction, - reduction_method = 'pca', - coordinates = pca_object$coords, - misc = list(eigenvalues = pca_object$eigenvalues, - loadings = pca_object$loadings), - my_rownames = colnames(expr_values)) - - screeplot = create_screeplot(eigs = slot(dimObject, 'misc')$eigenvalues, ncp = ncp, ylim = ylim) + # reduction of cells + if (reduction == "cells") { + # PCA on cells + if (method %in% biocsingular_methods) { + pca_object <- .run_pca_biocsingular( + x = t_flex(expr_values), + center = center, + scale = scale_unit, + ncp = ncp, + rev = rev, + BSPARAM = method, + BPPARAM = BiocParallel::SerialParam(), + ... + ) + } else if (method == "factominer") { + pca_object <- .run_pca_factominer( + x = t_flex(expr_values), + scale = scale_unit, ncp = ncp, rev = rev, ...) + } else { + stop("only PCA methods from the irlba and factominer package + have been implemented") + } + + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + provenance = provenance, + reduction = reduction, + reduction_method = "pca", + coordinates = pca_object$coords, + misc = list( + eigenvalues = pca_object$eigenvalues, + loadings = pca_object$loadings + ), + my_rownames = colnames(expr_values) + ) + + screeplot <- create_screeplot( + eigs = slot(dimObject, "misc")$eigenvalues, + ncp = ncp, ylim = ylim) + } } - } - - return(plot_output_handler( - gobject = gobject, - plot_object = screeplot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + return(plot_output_handler( + gobject = gobject, + plot_object = screeplot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -1391,58 +1501,67 @@ screePlot = function(gobject, #' @return ggplot #' @examples #' \dontrun{ -#' dr = GiottoData::loadSubObjectMini('dimObj') -#' scree = create_screeplot(methods::slot(dr, 'misc')$eigenvalues) +#' dr <- GiottoData::loadSubObjectMini("dimObj") +#' scree <- create_screeplot(methods::slot(dr, "misc")$eigenvalues) #' scree #' } #' @export -create_screeplot = function(eigs, ncp = 20, ylim = c(0, 20)) { - - checkmate::assert_numeric(eigs) - checkmate::assert_numeric(ncp, len = 1L) - checkmate::assert_numeric(ylim, len = 2L) - - # DT vars - PC = NULL - - eigs = sort(eigs, decreasing = TRUE) +create_screeplot <- function(eigs, ncp = 20, ylim = c(0, 20)) { + checkmate::assert_numeric(eigs) + checkmate::assert_numeric(ncp, len = 1L) + checkmate::assert_numeric(ylim, len = 2L) - # variance explained - var_expl = eigs/sum(eigs)*100 - var_expl_cum = cumsum(eigs)/sum(eigs)*100 + # DT vars + PC <- NULL - # create data.table - screeDT = data.table::data.table('PC' = paste0('PC.', seq_along(var_expl)), - 'var_expl' = var_expl, - 'var_expl_cum' = var_expl_cum) - screeDT[, 'PC' := factor(PC, levels = PC)] + eigs <- sort(eigs, decreasing = TRUE) - max_ncp = length(eigs) - ncp = ifelse(ncp > max_ncp, max_ncp, ncp) + # variance explained + var_expl <- eigs / sum(eigs) * 100 + var_expl_cum <- cumsum(eigs) / sum(eigs) * 100 - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_bw() - pl = pl + ggplot2::geom_bar(data = screeDT[1:ncp], ggplot2::aes(x = PC, y = var_expl), stat = 'identity') - pl = pl + ggplot2::coord_cartesian(ylim = ylim) - pl = pl + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1, vjust = 1)) - pl = pl + ggplot2::labs(x = '', y = '% of variance explained per PC') - - cpl = ggplot2::ggplot() - cpl = cpl + ggplot2::theme_bw() - cpl = cpl + ggplot2::geom_bar(data = screeDT[1:ncp], ggplot2::aes(x = PC, y = var_expl_cum), stat = 'identity') - cpl = cpl + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1, vjust = 1)) - cpl = cpl + ggplot2::labs(x = '', y = 'cumulative % of variance explained') - - savelist = list(pl, cpl) - - ## combine plots with cowplot - combo_plot <- cowplot::plot_grid(plotlist = savelist, - ncol = 1, - rel_heights = c(1), - rel_widths = c(1), - align = 'v') + # create data.table + screeDT <- data.table::data.table( + "PC" = paste0("PC.", seq_along(var_expl)), + "var_expl" = var_expl, + "var_expl_cum" = var_expl_cum + ) + screeDT[, "PC" := factor(PC, levels = PC)] + + max_ncp <- length(eigs) + ncp <- ifelse(ncp > max_ncp, max_ncp, ncp) + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_bar( + data = screeDT[1:ncp], + ggplot2::aes(x = PC, y = var_expl), stat = "identity") + pl <- pl + ggplot2::coord_cartesian(ylim = ylim) + pl <- pl + ggplot2::theme( + axis.text.x = ggplot2::element_text(angle = 45, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::labs(x = "", y = "% of variance explained per PC") + + cpl <- ggplot2::ggplot() + cpl <- cpl + ggplot2::theme_bw() + cpl <- cpl + ggplot2::geom_bar( + data = screeDT[1:ncp], + ggplot2::aes(x = PC, y = var_expl_cum), stat = "identity") + cpl <- cpl + ggplot2::theme(axis.text.x = ggplot2::element_text( + angle = 45, hjust = 1, vjust = 1)) + cpl <- cpl + ggplot2::labs(x = "", y = "cumulative % of variance explained") + + savelist <- list(pl, cpl) + + ## combine plots with cowplot + combo_plot <- cowplot::plot_grid( + plotlist = savelist, + ncol = 1, + rel_heights = c(1), + rel_widths = c(1), + align = "v" + ) - return(combo_plot) + return(combo_plot) } @@ -1469,92 +1588,110 @@ create_screeplot = function(eigs, ncp = 20, ylim = c(0, 20)) { #' @param verbose show progress of jackstraw method #' @return ggplot object for jackstraw method #' @details -#' The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} function. By -#' systematically permuting genes it identifies robust, and thus significant, PCs. +#' The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} +#' function. By systematically permuting genes it identifies robust, and thus +#' significant, PCs. #' \cr #' @export -jackstrawPlot = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - feats_to_use = NULL, - center = FALSE, - scale_unit = FALSE, - ncp = 20, - ylim = c(0, 1), - iter = 10, - threshold = 0.01, - verbose = TRUE, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'jackstrawPlot') { - - package_check(pkg_name = "jackstraw", repository = "CRAN") - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # print message with information # - if(verbose) message("using 'jackstraw' to identify significant PCs If used in published research, please cite: - Neo Christopher Chung and John D. Storey (2014). - 'Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics") - - # select direction of reduction - reduction = match.arg(reduction, c('cells', 'feats')) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - # reduction of cells - if(reduction == 'cells') { - - if(scale_unit == TRUE | center == TRUE) { - expr_values = t_flex(scale(t_flex(expr_values), center = center, scale = scale_unit)) +jackstrawPlot <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + feats_to_use = NULL, + center = FALSE, + scale_unit = FALSE, + ncp = 20, + ylim = c(0, 1), + iter = 10, + threshold = 0.01, + verbose = TRUE, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "jackstrawPlot") { + package_check(pkg_name = "jackstraw", repository = "CRAN") + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # print message with information # + if (verbose) + message("using 'jackstraw' to identify significant PCs If used in + published research, please cite: + Neo Christopher Chung and John D. Storey (2014). + 'Statistical significance of variables driving systematic variation in + high-dimensional data. Bioinformatics") + + # select direction of reduction + reduction <- match.arg(reduction, c("cells", "feats")) + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } + + # reduction of cells + if (reduction == "cells") { + if (scale_unit == TRUE | center == TRUE) { + expr_values <- t_flex(scale( + t_flex(expr_values), center = center, scale = scale_unit)) + } + + jtest <- jackstraw::permutationPA( + dat = as.matrix(expr_values), + B = iter, threshold = threshold, verbose = verbose) + + ## results ## + nr_sign_components <- jtest$r + if (verbose) + cat("number of estimated significant components: ", + nr_sign_components) + final_results <- jtest$p + jackplot <- create_jackstrawplot( + jackstraw_data = final_results, + ncp = ncp, ylim = ylim, threshold = threshold) } - jtest = jackstraw::permutationPA(dat = as.matrix(expr_values), B = iter, threshold = threshold, verbose = verbose) - - ## results ## - nr_sign_components = jtest$r - if(verbose) cat('number of estimated significant components: ', nr_sign_components, '\n') - final_results = jtest$p - jackplot = create_jackstrawplot(jackstraw_data = final_results, ncp = ncp, ylim = ylim, threshold = threshold) - - } - - return(plot_output_handler( - gobject = gobject, - plot_object = jackplot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + return(plot_output_handler( + gobject = gobject, + plot_object = jackplot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -1569,36 +1706,38 @@ jackstrawPlot = function(gobject, #' @keywords internal #' @return ggplot #' @export -create_jackstrawplot = function(jackstraw_data, - ncp = 20, - ylim = c(0, 1), - threshold = 0.01) { - - checkmate::assert_numeric(ncp, len = 1L) - checkmate::assert_numeric(ylim, len = 2L) - checkmate::assert_numeric(threshold, len = 1L) - - # data.table variables - PC = p.val = NULL - - testDT = data.table::data.table( - PC = paste0('PC.', seq_along(jackstraw_data)), - p.val = jackstraw_data - ) - testDT[, 'PC' := factor(PC, levels = PC)] - testDT[, 'sign' := ifelse(p.val <= threshold, 'sign', 'n.s.')] - - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_bw() - pl = pl + ggplot2::geom_point(data = testDT[1:ncp], ggplot2::aes(x = PC, y = p.val, fill = sign), shape = 21) - pl = pl + ggplot2::scale_fill_manual(values = c('n.s.' = 'lightgrey', 'sign' = 'darkorange')) - pl = pl + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1, vjust = 1)) - pl = pl + ggplot2::coord_cartesian(ylim = ylim) - pl = pl + ggplot2::theme(panel.grid.major.x = ggplot2::element_blank()) - pl = pl + ggplot2::labs(x = '', y = 'p-value per PC') - - return(pl) +create_jackstrawplot <- function(jackstraw_data, + ncp = 20, + ylim = c(0, 1), + threshold = 0.01) { + checkmate::assert_numeric(ncp, len = 1L) + checkmate::assert_numeric(ylim, len = 2L) + checkmate::assert_numeric(threshold, len = 1L) + # data.table variables + PC <- p.val <- NULL + + testDT <- data.table::data.table( + PC = paste0("PC.", seq_along(jackstraw_data)), + p.val = jackstraw_data + ) + testDT[, "PC" := factor(PC, levels = PC)] + testDT[, "sign" := ifelse(p.val <= threshold, "sign", "n.s.")] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_point( + data = testDT[1:ncp], + ggplot2::aes(x = PC, y = p.val, fill = sign), shape = 21) + pl <- pl + ggplot2::scale_fill_manual( + values = c("n.s." = "lightgrey", "sign" = "darkorange")) + pl <- pl + ggplot2::theme( + axis.text.x = ggplot2::element_text(angle = 45, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::coord_cartesian(ylim = ylim) + pl <- pl + ggplot2::theme(panel.grid.major.x = ggplot2::element_blank()) + pl <- pl + ggplot2::labs(x = "", y = "p-value per PC") + + return(pl) } @@ -1629,166 +1768,166 @@ create_jackstrawplot = function(jackstraw_data, #' @param jack_ylim y-axis limits on jackstraw plot #' @param verbose be verbose #' @return ggplot object for scree method and maxtrix of p-values for jackstraw -#' @details Two different methods can be used to assess the number of relevant or significant -#' prinicipal components (PC's). \cr +#' @details Two different methods can be used to assess the number of relevant +#' or significant prinicipal components (PC's). \cr #' 1. Screeplot works by plotting the explained variance of each -#' individual PC in a barplot allowing you to identify which PC provides a significant +#' individual PC in a barplot allowing you to identify which PC provides a +#' significant #' contribution (a.k.a. 'elbow method'). \cr -#' 2. The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} function. By -#' systematically permuting genes it identifies robust, and thus significant, PCs. +#' 2. The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} +#' function. By systematically permuting genes it identifies robust, and thus +#' significant, PCs. #' \cr #' @export signPCA <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - name = NULL, - method = c('screeplot', 'jackstraw'), - expression_values = c("normalized", "scaled", "custom"), - reduction = c("cells", "feats"), - pca_method = c('irlba', 'factominer'), - rev = FALSE, - feats_to_use = NULL, - center = TRUE, - scale_unit = TRUE, - ncp = 50, - scree_ylim = c(0,10), - jack_iter = 10, - jack_threshold = 0.01, - jack_ylim = c(0, 1), - verbose = TRUE, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'signPCA') { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # specify name to use - if(!is.null(name)) { - if(feat_type == 'rna') { - name = 'pca' - } else { - name = paste0(feat_type,'.','pca') - } - } - - # select method - method = match.arg(method, choices = c('screeplot', 'jackstraw')) - - # select PCA method - pca_method = match.arg(pca_method, choices = c('irlba', 'factominer')) - - # select direction of reduction - reduction = match.arg(reduction, c('cells', 'feats')) - - # expression values to be used - values = match.arg( - expression_values, - unique(c('normalized', 'scaled', 'custom', expression_values)) - ) - expr_values = get_expression_values( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix' - ) - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose + feat_type = NULL, + spat_unit = NULL, + name = NULL, + method = c("screeplot", "jackstraw"), + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + pca_method = c("irlba", "factominer"), + rev = FALSE, + feats_to_use = NULL, + center = TRUE, + scale_unit = TRUE, + ncp = 50, + scree_ylim = c(0, 10), + jack_iter = 10, + jack_threshold = 0.01, + jack_ylim = c(0, 1), + verbose = TRUE, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "signPCA") { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit ) - } - - # reduction of cells - if(reduction == 'cells') { - - if(method == 'screeplot') { - - screeplot = screePlot( + feat_type <- set_default_feat_type( gobject = gobject, spat_unit = spat_unit, - feat_type = feat_type, - name = name, - expression_values = values, - reduction = reduction, - feats_to_use = feats_to_use, - center = center, - scale_unit = scale_unit, - ncp = ncp, - rev = rev, - method = pca_method, - ylim = scree_ylim, - verbose = verbose, - show_plot = FALSE, - return_plot = TRUE, - save_plot = FALSE, - save_param = list(), - default_save_name = 'screePlot' - ) - - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = screeplot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + feat_type = feat_type + ) + # specify name to use + if (!is.null(name)) { + if (feat_type == "rna") { + name <- "pca" + } else { + name <- paste0(feat_type, ".", "pca") + } + } + + # select method + method <- match.arg(method, choices = c("screeplot", "jackstraw")) - } else if(method == 'jackstraw') { + # select PCA method + pca_method <- match.arg(pca_method, choices = c("irlba", "factominer")) - jackplot = jackstrawPlot( + # select direction of reduction + reduction <- match.arg(reduction, c("cells", "feats")) + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values)) + ) + expr_values <- getExpression( gobject = gobject, spat_unit = spat_unit, feat_type = feat_type, - expression_values = values, - reduction = reduction, - feats_to_use = feats_to_use, - center = center, - scale_unit = scale_unit, - ncp = ncp, - ylim = jack_ylim, - iter = jack_iter, - threshold = jack_threshold, - verbose = verbose, - show_plot = FALSE, - return_plot = TRUE, - save_plot = FALSE, - save_param = list(), - default_save_name = 'jackstrawPlot' - ) - - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = jackplot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = jackplot # TODO potentially return all results instead - )) + values = values, + output = "matrix" + ) + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) } - } else { - cat('gene reduction not yet implemented') - } + # reduction of cells + if (reduction == "cells") { + if (method == "screeplot") { + screeplot <- screePlot( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = name, + expression_values = values, + reduction = reduction, + feats_to_use = feats_to_use, + center = center, + scale_unit = scale_unit, + ncp = ncp, + rev = rev, + method = pca_method, + ylim = scree_ylim, + verbose = verbose, + show_plot = FALSE, + return_plot = TRUE, + save_plot = FALSE, + save_param = list(), + default_save_name = "screePlot" + ) + + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = screeplot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) + } else if (method == "jackstraw") { + jackplot <- jackstrawPlot( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + reduction = reduction, + feats_to_use = feats_to_use, + center = center, + scale_unit = scale_unit, + ncp = ncp, + ylim = jack_ylim, + iter = jack_iter, + threshold = jack_threshold, + verbose = verbose, + show_plot = FALSE, + return_plot = TRUE, + save_plot = FALSE, + save_param = list(), + default_save_name = "jackstrawPlot" + ) + + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = jackplot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = jackplot + # TODO potentially return all results instead + )) + } + } else { + message("gene reduction not yet implemented") + } } @@ -1825,7 +1964,8 @@ signPCA <- function(gobject, #' @param seed_number seed number to use #' @param verbose verbosity of function #' @param toplevel_params parameters to extract -#' @inheritDotParams uwot::umap -X -n_neighbors -n_components -n_epochs -min_dist -n_threads -spread -seed -scale -pca -pca_center -pca_method +#' @inheritDotParams uwot::umap -X -n_neighbors -n_components -n_epochs +#' -min_dist -n_threads -spread -seed -scale -pca -pca_center -pca_method #' @return giotto object with updated UMAP dimension reduction #' @details See \code{\link[uwot]{umap}} for more information about these and #' other parameters. @@ -1841,208 +1981,217 @@ signPCA <- function(gobject, #' } #' @export runUMAP <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - dim_reduction_to_use = 'pca', - dim_reduction_name = NULL, - dimensions_to_use = 1:10, - name = NULL, - feats_to_use = NULL, - return_gobject = TRUE, - n_neighbors = 40, - n_components = 2, - n_epochs = 400, - min_dist = 0.01, - n_threads = NA, - spread = 5, - set_seed = TRUE, - seed_number = 1234L, - verbose = TRUE, - toplevel_params = 2L, - ...) { - - # NSE vars - cell_ID = NULL - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - reduction = match.arg(reduction, choices = c('cells', 'feats')) - - - # specify dim_reduction_name to use for pca input for umap - if(!is.null(dim_reduction_to_use)) { - if(dim_reduction_to_use == 'pca') { - if(is.null(dim_reduction_name)) { - if(feat_type == 'rna') { - dim_reduction_name = 'pca' - } else { - dim_reduction_name = paste0(feat_type,'.','pca') - } - } - } - } + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + dim_reduction_to_use = "pca", + dim_reduction_name = NULL, + dimensions_to_use = 1:10, + name = NULL, + feats_to_use = NULL, + return_gobject = TRUE, + n_neighbors = 40, + n_components = 2, + n_epochs = 400, + min_dist = 0.01, + n_threads = NA, + spread = 5, + set_seed = TRUE, + seed_number = 1234L, + verbose = TRUE, + toplevel_params = 2L, + ...) { + # NSE vars + cell_ID <- NULL + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + reduction <- match.arg(reduction, choices = c("cells", "feats")) - # specify name to use for umap - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'umap' - } else { - name = paste0(feat_type,'.','umap') + # specify dim_reduction_name to use for pca input for umap + if (!is.null(dim_reduction_to_use)) { + if (dim_reduction_to_use == "pca") { + if (is.null(dim_reduction_name)) { + if (feat_type == "rna") { + dim_reduction_name <- "pca" + } else { + dim_reduction_name <- paste0(feat_type, ".", "pca") + } + } + } } - } - - - # set cores to use - n_threads = determine_cores(cores = n_threads) - - ## umap on cells ## - if(reduction == 'cells') { - - ## using dimension reduction ## - if(!is.null(dim_reduction_to_use)) { - - ## TODO: check if reduction exists - dimObj_to_use = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'dimObj') - - provenance = prov(dimObj_to_use) - matrix_to_use = dimObj_to_use[] - - if (any(!dimensions_to_use %in% seq(ncol(matrix_to_use)))) { - warning(wrap_txt( - 'dimensions_to_use requested is outside what is available.', - paste0('(1 to', ncol(matrix_to_use), ')'), - 'Ignoring dimensions_to_use that are outside the range.' - )) - dimensions_to_use = dimensions_to_use[dimensions_to_use %in% seq(ncol(matrix_to_use))] - } - - matrix_to_use = matrix_to_use[, dimensions_to_use] - - - - } else { - - ## using original matrix ## - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - provenance = prov(expr_values) - expr_values = expr_values[] # extract matrix - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - matrix_to_use = t_flex(expr_values) - } - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) - on.exit({GiottoUtils::random_seed(set.seed = TRUE)}, add = TRUE) + # specify name to use for umap + if (is.null(name)) { + if (feat_type == "rna") { + name <- "umap" + } else { + name <- paste0(feat_type, ".", "umap") + } } - ## run umap ## - uwot_clus <- uwot::umap(X = matrix_to_use, # as.matrix(matrix_to_use) necessary? - n_neighbors = n_neighbors, - n_components = n_components, - n_epochs = n_epochs, - min_dist = min_dist, - n_threads = n_threads, - spread = spread, - ...) - - uwot_clus_pos_DT = data.table::as.data.table(uwot_clus) - uwot_clus_pos_DT[, cell_ID := rownames(matrix_to_use)] - - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) - } + # set cores to use + n_threads <- determine_cores(cores = n_threads) + ## umap on cells ## + if (reduction == "cells") { + ## using dimension reduction ## + if (!is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + dimObj_to_use <- get_dimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "dimObj" + ) + + provenance <- prov(dimObj_to_use) + matrix_to_use <- dimObj_to_use[] + + if (any(!dimensions_to_use %in% seq(ncol(matrix_to_use)))) { + warning(wrap_txt( + "dimensions_to_use requested is outside what is available.", + paste0("(1 to", ncol(matrix_to_use), ")"), + "Ignoring dimensions_to_use that are outside the range." + )) + dimensions_to_use <- dimensions_to_use[ + dimensions_to_use %in% seq(ncol(matrix_to_use))] + } + + matrix_to_use <- matrix_to_use[, dimensions_to_use] + } else { + ## using original matrix ## + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } + + matrix_to_use <- t_flex(expr_values) + } - if(return_gobject == TRUE) { + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + on.exit( + { + GiottoUtils::random_seed(set.seed = TRUE) + }, + add = TRUE + ) + } - umap_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'umap') + ## run umap ## + uwot_clus <- uwot::umap( + X = matrix_to_use, # as.matrix(matrix_to_use) necessary? + n_neighbors = n_neighbors, + n_components = n_components, + n_epochs = n_epochs, + min_dist = min_dist, + n_threads = n_threads, + spread = spread, + ... + ) + + uwot_clus_pos_DT <- data.table::as.data.table(uwot_clus) + + uwot_clus_pos_DT[, cell_ID := rownames(matrix_to_use)] + + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } - if(name %in% umap_names) { - message('\n ', name, ' has already been used, will be overwritten \n') - } + if (return_gobject == TRUE) { + umap_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "umap" + ) - coordinates = uwot_clus - rownames(coordinates) = rownames(matrix_to_use) + if (name %in% umap_names) { + message(name, " has already been used, will be overwritten") + } - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - reduction = reduction, - provenance = provenance, - reduction_method = 'umap', - coordinates = coordinates, - misc = NULL) + coordinates <- uwot_clus + rownames(coordinates) <- rownames(matrix_to_use) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + reduction = reduction, + provenance = provenance, + reduction_method = "umap", + coordinates = coordinates, + misc = NULL + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, + dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - ## update parameters used ## - gobject = update_giotto_params(gobject, - description = '_umap', - return_gobject = TRUE, - toplevel = toplevel_params) - return(gobject) - } else { - return(uwot_clus_pos_DT) + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_umap", + return_gobject = TRUE, + toplevel = toplevel_params + ) + return(gobject) + } else { + return(uwot_clus_pos_DT) + } + } else if (reduction == "feats") { + message("Feats reduction is not yet implemented") } - - - - - } else if(reduction == 'feats') { - - message('\n Feats reduction is not yet implemented \n') - - } - } @@ -2078,7 +2227,8 @@ runUMAP <- function(gobject, #' @param toplevel_params parameters to extract #' @param ... additional UMAP parameters #' @return giotto object with updated UMAP dimension reduction -#' @details See \code{\link[uwot]{umap}} for more information about these and other parameters. +#' @details See \code{\link[uwot]{umap}} for more information about these and +#' other parameters. #' \itemize{ #' \item Input for UMAP dimension reduction can be another dimension reduction (default = 'pca') #' \item To use gene expression as input set dim_reduction_to_use = NULL @@ -2087,217 +2237,221 @@ runUMAP <- function(gobject, #' \item multiple UMAP results can be stored by changing the \emph{name} of the analysis #' } #' @export -runUMAPprojection = function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - dim_reduction_to_use = 'pca', - dim_reduction_name = NULL, - dimensions_to_use = 1:10, - random_subset = 500, - name = NULL, - feats_to_use = NULL, - return_gobject = TRUE, - n_neighbors = 40, - n_components = 2, - n_epochs = 400, - min_dist = 0.01, - n_threads = NA, - spread = 5, - set_seed = TRUE, - seed_number = 1234, - verbose = T, - toplevel_params = 2, - ...) { - - # NSE vars - cell_ID = NULL - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - reduction = match.arg(reduction, choices = c('cells', 'feats')) - - - # specify dim_reduction_name to use for pca input for umap - if(!is.null(dim_reduction_to_use)) { - if(dim_reduction_to_use == 'pca') { - if(is.null(dim_reduction_name)) { - if(feat_type == 'rna') { - dim_reduction_name = 'pca' - } else { - dim_reduction_name = paste0(feat_type,'.','pca') - } - } - } - } +runUMAPprojection <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + dim_reduction_to_use = "pca", + dim_reduction_name = NULL, + dimensions_to_use = 1:10, + random_subset = 500, + name = NULL, + feats_to_use = NULL, + return_gobject = TRUE, + n_neighbors = 40, + n_components = 2, + n_epochs = 400, + min_dist = 0.01, + n_threads = NA, + spread = 5, + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE, + toplevel_params = 2, + ...) { + # NSE vars + cell_ID <- NULL + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + reduction <- match.arg(reduction, choices = c("cells", "feats")) - # specify name to use for umap - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'umap.projection' - } else { - name = paste0(feat_type,'.','umap.projection') + # specify dim_reduction_name to use for pca input for umap + if (!is.null(dim_reduction_to_use)) { + if (dim_reduction_to_use == "pca") { + if (is.null(dim_reduction_name)) { + if (feat_type == "rna") { + dim_reduction_name <- "pca" + } else { + dim_reduction_name <- paste0(feat_type, ".", "pca") + } + } + } } - } - - - # set cores to use - n_threads = determine_cores(cores = n_threads) - ## umap on cells ## - if(reduction == 'cells') { - ## using dimension reduction ## - if(!is.null(dim_reduction_to_use)) { - - ## TODO: check if reduction exists - dimObj_to_use = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'dimObj') - - provenance = prov(dimObj_to_use) - matrix_to_use = dimObj_to_use[] - - matrix_to_use = matrix_to_use[, dimensions_to_use] - - } else { - - ## using original matrix ## - # expression values to be used - values = match.arg( - expression_values, - unique(c('normalized', 'scaled', 'custom', expression_values)) - ) - - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - expr_values = expr_values[] # extract matrix - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - matrix_to_use = t_flex(expr_values) - } - - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) + # specify name to use for umap + if (is.null(name)) { + if (feat_type == "rna") { + name <- "umap.projection" + } else { + name <- paste0(feat_type, ".", "umap.projection") + } } - ## run umap ## - cell_ID_order = rownames(matrix_to_use) - - # create random selection - random_selection = sort(sample(1:nrow(matrix_to_use), random_subset)) - subsample_matrix = matrix_to_use[random_selection, ] - - uwot_clus_subset <- uwot::umap( - X = subsample_matrix, - n_neighbors = n_neighbors, - n_components = n_components, - n_epochs = n_epochs, - min_dist = min_dist, - n_threads = n_threads, - spread = spread, - ret_model = TRUE, - ... - ) + # set cores to use + n_threads <- determine_cores(cores = n_threads) - # create leftover matrix - leftover_matrix = matrix_to_use[-random_selection, ] - - # make prediction on leftover matrix - uwot_clus_pred = uwot::umap_transform( - X = leftover_matrix, model = uwot_clus_subset - ) + ## umap on cells ## + if (reduction == "cells") { + ## using dimension reduction ## + if (!is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + dimObj_to_use <- getDimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "dimObj" + ) + + provenance <- prov(dimObj_to_use) + matrix_to_use <- dimObj_to_use[] + + matrix_to_use <- matrix_to_use[, dimensions_to_use] + } else { + ## using original matrix ## + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values)) + ) + + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } + + matrix_to_use <- t_flex(expr_values) + } - # combine subset and prediction - coords_umap = rbind(uwot_clus_subset$embedding, uwot_clus_pred) - coords_umap = coords_umap[match(cell_ID_order, rownames(coords_umap)), ] + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } - coords_umap_DT = data.table::as.data.table(coords_umap) - coords_umap_DT[, cell_ID := rownames(coords_umap)] - # exit seed - if(isTRUE(set_seed)) { - set.seed(seed = Sys.time()) + ## run umap ## + cell_ID_order <- rownames(matrix_to_use) + + # create random selection + random_selection <- sort(sample(1:nrow(matrix_to_use), random_subset)) + subsample_matrix <- matrix_to_use[random_selection, ] + + uwot_clus_subset <- uwot::umap( + X = subsample_matrix, + n_neighbors = n_neighbors, + n_components = n_components, + n_epochs = n_epochs, + min_dist = min_dist, + n_threads = n_threads, + spread = spread, + ret_model = TRUE, + ... + ) + + # create leftover matrix + leftover_matrix <- matrix_to_use[-random_selection, ] + + # make prediction on leftover matrix + uwot_clus_pred <- uwot::umap_transform( + X = leftover_matrix, model = uwot_clus_subset + ) + + # combine subset and prediction + coords_umap <- rbind(uwot_clus_subset$embedding, uwot_clus_pred) + coords_umap <- coords_umap[ + match(cell_ID_order, rownames(coords_umap)), ] + + coords_umap_DT <- data.table::as.data.table(coords_umap) + coords_umap_DT[, cell_ID := rownames(coords_umap)] + + # exit seed + if (isTRUE(set_seed)) { + set.seed(seed = Sys.time()) + } + } else if (reduction == "feats") { + message("Feats reduction is not yet implemented") } - } else if(reduction == 'feats') { - message('\n Feats reduction is not yet implemented \n') - } - + if (isTRUE(return_gobject)) { + umap_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "umap" + ) + if (name %in% umap_names) { + message(name, " has already been used, will be overwritten") + } - if(isTRUE(return_gobject)) { - - umap_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'umap') - if(name %in% umap_names) { - message('\n ', name, ' has already been used, will be overwritten \n') + coordinates <- coords_umap + + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + reduction = reduction, + provenance = provenance, + reduction_method = "umap", + coordinates = coordinates, + misc = NULL + ) + + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + ## update parameters used ## + gobject <- update_giotto_params( + gobject, + description = "_umap", + return_gobject = TRUE, + toplevel = toplevel_params + ) + return(gobject) + } else { + return(coords_umap_DT) } - - - coordinates = coords_umap - - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - reduction = reduction, - provenance = provenance, - reduction_method = 'umap', - coordinates = coordinates, - misc = NULL) - - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - ## update parameters used ## - gobject = update_giotto_params( - gobject, - description = '_umap', - return_gobject = TRUE, - toplevel = toplevel_params - ) - return(gobject) - - } else { - return(coords_umap_DT) - } } @@ -2331,7 +2485,8 @@ runUMAPprojection = function(gobject, #' @param verbose verbosity of the function #' @param ... additional tSNE parameters #' @return giotto object with updated tSNE dimension recuction -#' @details See \code{\link[Rtsne]{Rtsne}} for more information about these and other parameters. \cr +#' @details See \code{\link[Rtsne]{Rtsne}} for more information about these and +#' other parameters. \cr #' \itemize{ #' \item Input for tSNE dimension reduction can be another dimension reduction (default = 'pca') #' \item To use gene expression as input set dim_reduction_to_use = NULL @@ -2341,174 +2496,180 @@ runUMAPprojection = function(gobject, #' } #' @export runtSNE <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = c('cells', 'feats'), - dim_reduction_to_use = 'pca', - dim_reduction_name = NULL, - dimensions_to_use = 1:10, - name = NULL, - feats_to_use = NULL, - return_gobject = TRUE, - dims = 2, - perplexity = 30, - theta = 0.5, - do_PCA_first = FALSE, - set_seed = TRUE, - seed_number = 1234, - verbose = TRUE, - ...) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - reduction = match.arg(reduction, choices = c('cells', 'feats')) - - - # specify dim_reduction_name to use for pca input for tsne - if(!is.null(dim_reduction_to_use)) { - if(dim_reduction_to_use == 'pca') { - if(is.null(dim_reduction_name)) { - if(feat_type == 'rna') { - dim_reduction_name = 'pca' - } else { - dim_reduction_name = paste0(feat_type,'.','pca') - } - } - } - } - - - # specify name to use for umap - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'tsne' - } else { - name = paste0(feat_type,'.','tsne') - } - } - - - - - ## tsne on cells ## - if(reduction == 'cells') { - - ## using dimension reduction ## - if(!is.null(dim_reduction_to_use)) { - - ## TODO: check if reduction exists - dimObj_to_use = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'dimObj') - - provenance = prov(dimObj_to_use) - matrix_to_use = dimObj_to_use[] - matrix_to_use = matrix_to_use[, dimensions_to_use] + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reduction = c("cells", "feats"), + dim_reduction_to_use = "pca", + dim_reduction_name = NULL, + dimensions_to_use = 1:10, + name = NULL, + feats_to_use = NULL, + return_gobject = TRUE, + dims = 2, + perplexity = 30, + theta = 0.5, + do_PCA_first = FALSE, + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE, + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } else { - ## using original matrix ## - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - expr_values = expr_values[] # extract matrix - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } - - matrix_to_use = t_flex(expr_values) + reduction <- match.arg(reduction, choices = c("cells", "feats")) - } - # start seed - if(isTRUE(set_seed)) { - set.seed(seed = seed_number) + # specify dim_reduction_name to use for pca input for tsne + if (!is.null(dim_reduction_to_use)) { + if (dim_reduction_to_use == "pca") { + if (is.null(dim_reduction_name)) { + if (feat_type == "rna") { + dim_reduction_name <- "pca" + } else { + dim_reduction_name <- paste0(feat_type, ".", "pca") + } + } + } } - ## run tSNE ## - tsne_clus = Rtsne::Rtsne(X = matrix_to_use, - dims = dims, - perplexity = perplexity, - theta = theta, - pca = do_PCA_first, ...) - - tsne_clus_pos_DT = data.table::as.data.table(tsne_clus$Y) - - # data.table variables - cell_ID = NULL - tsne_clus_pos_DT[, cell_ID := rownames(matrix_to_use)] - # exit seed - if(isTRUE(set_seed)) { - set.seed(Sys.time()) + # specify name to use for umap + if (is.null(name)) { + if (feat_type == "rna") { + name <- "tsne" + } else { + name <- paste0(feat_type, ".", "tsne") + } } - if(isTRUE(return_gobject)) { - tsne_names = list_dim_reductions_names(gobject = gobject, data_type = reduction, - spat_unit = spat_unit, feat_type = feat_type, - dim_type = 'tsne') - if(name %in% tsne_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } + ## tsne on cells ## + if (reduction == "cells") { + ## using dimension reduction ## + if (!is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + dimObj_to_use <- getDimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "dimObj" + ) + + provenance <- prov(dimObj_to_use) + matrix_to_use <- dimObj_to_use[] + matrix_to_use <- matrix_to_use[, dimensions_to_use] + } else { + ## using original matrix ## + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } + + matrix_to_use <- t_flex(expr_values) + } + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + } - coordinates = tsne_clus$Y - rownames(coordinates) = rownames(matrix_to_use) + ## run tSNE ## + tsne_clus <- Rtsne::Rtsne( + X = matrix_to_use, + dims = dims, + perplexity = perplexity, + theta = theta, + pca = do_PCA_first, ... + ) - dimObject = create_dim_obj(name = name, - feat_type = feat_type, - spat_unit = spat_unit, - provenance = provenance, - reduction = reduction, - reduction_method = 'tsne', - coordinates = coordinates, - misc = tsne_clus) + tsne_clus_pos_DT <- data.table::as.data.table(tsne_clus$Y) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = dimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + # data.table variables + cell_ID <- NULL + tsne_clus_pos_DT[, cell_ID := rownames(matrix_to_use)] - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_tsne') - return(gobject) + # exit seed + if (isTRUE(set_seed)) { + set.seed(Sys.time()) + } - } else { - return(tsne_clus_pos_DT) + if (isTRUE(return_gobject)) { + tsne_names <- list_dim_reductions_names( + gobject = gobject, data_type = reduction, + spat_unit = spat_unit, feat_type = feat_type, + dim_type = "tsne" + ) + + if (name %in% tsne_names) { + cat(name, " has already been used, will be overwritten") + } + + + coordinates <- tsne_clus$Y + rownames(coordinates) <- rownames(matrix_to_use) + + dimObject <- create_dim_obj( + name = name, + feat_type = feat_type, + spat_unit = spat_unit, + provenance = provenance, + reduction = reduction, + reduction_method = "tsne", + coordinates = coordinates, + misc = tsne_clus + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, + dimObject = dimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_tsne") + return(gobject) + } else { + return(tsne_clus_pos_DT) + } + } else if (reduction == "feats") { + message("Not yet implemented") } - - - } else if(reduction == 'feats') { - - cat('\n Not yet implemented \n') - - } - } @@ -2542,196 +2703,199 @@ runtSNE <- function(gobject, #' @param toplevel_params parameters to extract #' @param verbose be verbose #' @param ... additional \code{\link[harmony]{HarmonyMatrix}} parameters -#' @return giotto object with updated Harmony dimension recuction -#' @details This is a simple wrapper for the HarmonyMatrix function in the Harmony package \doi{10.1038/s41592-019-0619-0}. +#' @return giotto object with updated Harmony dimension reduction +#' @details This is a simple wrapper for the HarmonyMatrix function in the +#' Harmony package \doi{10.1038/s41592-019-0619-0}. #' @export -runGiottoHarmony = function(gobject, - spat_unit = NULL, - feat_type = NULL, - vars_use = 'list_ID', - do_pca = FALSE, - expression_values = c('normalized', 'scaled', 'custom'), - reduction = 'cells', - dim_reduction_to_use = 'pca', - dim_reduction_name = NULL, - dimensions_to_use = 1:10, - name = NULL, - feats_to_use = NULL, - set_seed = TRUE, - seed_number = 1234, - toplevel_params = 2, - return_gobject = TRUE, - verbose = NULL, - ...) { - - - # verify if optional package is installed - package_check(pkg_name = "harmony", repository = "CRAN") - - - # print message with information # - wrap_msg("using 'Harmony' to integrate different datasets. If used in published research, please cite:\n") - - wrap_msg( - "Korsunsky, I., Millard, N., Fan, J. et al. +runGiottoHarmony <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + vars_use = "list_ID", + do_pca = FALSE, + expression_values = c("normalized", "scaled", "custom"), + reduction = "cells", + dim_reduction_to_use = "pca", + dim_reduction_name = NULL, + dimensions_to_use = 1:10, + name = NULL, + feats_to_use = NULL, + set_seed = TRUE, + seed_number = 1234, + toplevel_params = 2, + return_gobject = TRUE, + verbose = NULL, + ...) { + # verify if optional package is installed + package_check(pkg_name = "harmony", repository = "CRAN") + + + # print message with information # + message("using 'Harmony' to integrate different datasets. If used in + published research, please cite:") + + wrap_msg("Korsunsky, I., Millard, N., Fan, J. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289-1296 (2019). https://doi.org/10.1038/s41592-019-0619-0", - .initial = ' ', - .prefix = ' ' - ) + .initial = " ", + .prefix = " " + ) - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # specify dim_reduction_name to use for pca input for umap - if(!is.null(dim_reduction_to_use)) { - if(dim_reduction_to_use == 'pca') { - if(is.null(dim_reduction_name)) { - if(feat_type == 'rna') { - dim_reduction_name = 'pca' - } else { - dim_reduction_name = paste0(feat_type,'.','pca') + # specify dim_reduction_name to use for pca input for umap + if (!is.null(dim_reduction_to_use)) { + if (dim_reduction_to_use == "pca") { + if (is.null(dim_reduction_name)) { + if (feat_type == "rna") { + dim_reduction_name <- "pca" + } else { + dim_reduction_name <- paste0(feat_type, ".", "pca") + } + } } - } } - } - # specify name to use for harmony - if(is.null(name)) { - if(feat_type == 'rna') { - name = 'harmony' - } else { - name = paste0(feat_type,'.','harmony') + # specify name to use for harmony + if (is.null(name)) { + if (feat_type == "rna") { + name <- "harmony" + } else { + name <- paste0(feat_type, ".", "harmony") + } } - } - - - - - # set cores to use - #n_threads = determine_cores(cores = n_threads) - ## using dimension reduction ## - if(!is.null(dim_reduction_to_use)) { - ## TODO: check if reduction exists - matrix_to_use = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = reduction, # set to spat_unit? - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'dimObj') - provenance = prov(matrix_to_use) - matrix_to_use = matrix_to_use[] - matrix_to_use = matrix_to_use[, dimensions_to_use] + # set cores to use + # n_threads = determine_cores(cores = n_threads) - } else { - ## using original matrix ## - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - provenance = prov(expr_values) - expr_values = expr_values[] # extract matrix - - - ## subset matrix - if(!is.null(feats_to_use)) { - expr_values = .create_feats_to_use_matrix(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - sel_matrix = expr_values, - feats_to_use = feats_to_use, - verbose = verbose) - } + ## using dimension reduction ## + if (!is.null(dim_reduction_to_use)) { + ## TODO: check if reduction exists + matrix_to_use <- get_dimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = reduction, # set to spat_unit? + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "dimObj" + ) + provenance <- prov(matrix_to_use) + matrix_to_use <- matrix_to_use[] + + matrix_to_use <- matrix_to_use[, dimensions_to_use] + } else { + ## using original matrix ## + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + provenance <- prov(expr_values) + expr_values <- expr_values[] # extract matrix + + + ## subset matrix + if (!is.null(feats_to_use)) { + expr_values <- .create_feats_to_use_matrix( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + sel_matrix = expr_values, + feats_to_use = feats_to_use, + verbose = verbose + ) + } - matrix_to_use = t_flex(expr_values) - } - - # get metadata - metadata = pDataDT(gobject, feat_type = feat_type, spat_unit = spat_unit) - - # start seed - if (isTRUE(set_seed)) { - set.seed(seed = seed_number) - on.exit(GiottoUtils::random_seed()) - } - - # run harmony - harmony_results = harmony::RunHarmony( - data_mat = matrix_to_use, - meta_data = metadata, - vars_use = vars_use, - do_pca = do_pca, - ... - ) - - - colnames(harmony_results) = paste0('Dim.', 1:ncol(harmony_results)) - rownames(harmony_results) = rownames(matrix_to_use) - - harmdimObject = create_dim_obj(name = name, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = provenance, - reduction = 'cells', # set to spat_unit? - reduction_method = 'harmony', - coordinates = harmony_results, - misc = NULL) - - # return giotto object or harmony results - if(isTRUE(return_gobject)) { - - #harmony_names = names(gobject@dimension_reduction[['cells']][[spat_unit]][['harmony']]) - - harmony_names = list_dim_reductions_names(gobject = gobject, - data_type = reduction, - spat_unit = spat_unit, - feat_type = feat_type, - dim_type = 'harmony') - - if(name %in% harmony_names) { - cat('\n ', name, ' has already been used with harmony, will be overwritten \n') + matrix_to_use <- t_flex(expr_values) } - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_dimReduction(gobject = gobject, dimObject = harmdimObject) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - + # get metadata + metadata <- pDataDT(gobject, feat_type = feat_type, spat_unit = spat_unit) - ## update parameters used ## - gobject = update_giotto_params(gobject, - description = '_harmony', - return_gobject = TRUE, - toplevel = toplevel_params) - return(gobject) + # start seed + if (isTRUE(set_seed)) { + set.seed(seed = seed_number) + on.exit(GiottoUtils::random_seed()) + } - } else { - return(harmdimObject) - } + # run harmony + harmony_results <- harmony::RunHarmony( + data_mat = matrix_to_use, + meta_data = metadata, + vars_use = vars_use, + do_pca = do_pca, + ... + ) -} + colnames(harmony_results) <- paste0("Dim.", 1:ncol(harmony_results)) + rownames(harmony_results) <- rownames(matrix_to_use) + harmdimObject <- create_dim_obj( + name = name, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = provenance, + reduction = "cells", # set to spat_unit? + reduction_method = "harmony", + coordinates = harmony_results, + misc = NULL + ) + # return giotto object or harmony results + if (isTRUE(return_gobject)) { + harmony_names <- list_dim_reductions_names( + gobject = gobject, + data_type = reduction, + spat_unit = spat_unit, + feat_type = feat_type, + dim_type = "harmony" + ) + if (name %in% harmony_names) { + cat(name, + " has already been used with harmony, will be overwritten") + } + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_dimReduction(gobject = gobject, + dimObject = harmdimObject) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_harmony", + return_gobject = TRUE, + toplevel = toplevel_params + ) + return(gobject) + } else { + return(harmdimObject) + } +} diff --git a/R/feature_set_enrichment.R b/R/feature_set_enrichment.R index 4e7c84074..f52fb6241 100644 --- a/R/feature_set_enrichment.R +++ b/R/feature_set_enrichment.R @@ -1,31 +1,43 @@ - #' @title doFeatureSetEnrichment #' @name doFeatureSetEnrichment #' @description Preform Gene Set Enrichment Analysis using marker genes #' @param dryrun do a dry run, default TRUE. -#' @param path_to_GSEA path to GSEA command line executable, e.g. gsea-XXX.jar. See details (1.) for more information. -#' @param GSEA_dataset path to a Human/Mouse collection from GSEA, e.g. Hallmarks C1. See details (2.) for more information. -#' @param GSEA_ranked_file path to .rnk file for GSEA. See details (3.) for more information -#' @param output_folder path to which the GSEA results will be saved. Default is current working directory. -#' @param name_analysis_folder default output subdirectory prefix to which results are saved. -#' Will live within output_folder; equivalent of "Analysis Name" in GSEA Application. -#' @param collapse only 'false' is supported. This will use your dataset as-is, in the original format. -#' @param mode option selected in Advanced Field "Collapsing Mode for Probe Sets => 1 gene" +#' @param path_to_GSEA path to GSEA command line executable, e.g. gsea-XXX.jar. +#' See details (1.) for more information. +#' @param GSEA_dataset path to a Human/Mouse collection from GSEA, e.g. +#' Hallmarks C1. See details (2.) for more information. +#' @param GSEA_ranked_file path to .rnk file for GSEA. See details (3.) for +#' more information +#' @param output_folder path to which the GSEA results will be saved. Default +#' is current working directory. +#' @param name_analysis_folder default output subdirectory prefix to which +#' results are saved. +#' Will live within output_folder; equivalent of +#' "Analysis Name" in GSEA Application. +#' @param collapse only 'false' is supported. This will use your dataset as-is, +#' in the original format. +#' @param mode option selected in Advanced Field "Collapsing Mode for +#' Probe Sets => 1 gene" #' @param norm normalization mode; only meandiv is supported. #' @param nperm number of permutations, default 1000 -#' @param scoring_scheme Default "weighted", equivalent of "enrichment statistic" in GSEA Application +#' @param scoring_scheme Default "weighted", equivalent of +#' "enrichment statistic" in GSEA Application #' @param plot_top_x Default 20, number of enrichment plots to produce. -#' @param set_max default 500, equivalent to "max size; exclude larger sets" in Basic Fields in GSEA Application -#' @param set_min default 15, equivalent to "min size; exclude smaller sets" in Basic Fields in GSEA Application +#' @param set_max default 500, equivalent to "max size; exclude larger sets" +#' in Basic Fields in GSEA Application +#' @param set_min default 15, equivalent to "min size; exclude smaller sets" +#' in Basic Fields in GSEA Application #' @details #' NECESSARY PREREQUISITES #' 1. download and install the COMMAND line (all platforms) gsea-XXX.jar #' https://www.gsea-msigdb.org/gsea/downloads.jsp #' 1.1. download zip file -#' 1.2. unzip and move to known location (e.g. in path/to/your/applications/gsea/GSEA_4.3.2) +#' 1.2. unzip and move to known location +#' (e.g. in path/to/your/applications/gsea/GSEA_4.3.2) #' #' 2. download the Human and Mouse collections -#' https://www.gsea-msigdb.org/gsea/msigdb/index.jsp or zipped folder https://www.gsea-msigdb.org/gsea/downloads.jsp (all downloaded) +#' https://www.gsea-msigdb.org/gsea/msigdb/index.jsp or zipped folder +#' https://www.gsea-msigdb.org/gsea/downloads.jsp (all downloaded) #' #' 3. create ranked gene lists #' format: data.table or data.frame with 2 columns @@ -38,100 +50,118 @@ #' https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Syntax #' @export doFeatureSetEnrichment <- function(dryrun = TRUE, - path_to_GSEA = NULL, - GSEA_dataset = NULL, - GSEA_ranked_file = NULL, - output_folder = NULL, - name_analysis_folder = 'my_GSEA_analysis', - collapse = 'false', - mode = c("Abs_max_of_probes", - "Max_probe", - "Median_of_probes", - "Mean_of_probes", - "Sum_of_probes"), - norm = 'meandiv', - nperm = 1000, - scoring_scheme = 'weighted', - plot_top_x = 20, - set_max = 500, - set_min = 15) { - - # set don't run to false as a start - dont_run = FALSE - - # SYSTEM CHECK FOR JAVA - java_not_installed = as.logical(system("java -version")) - # returns 0 if java is installed (i.e., command runs successfully), 1 otherwise - if(java_not_installed) stop(wrap_txt("Java must be installed for doFeatureSetEnrichment() to run. Please install Java: https://www.java.com/en/download/", errWidth = TRUE)) - - - mode = match.arg(mode, choices = c("Abs_max_of_probes", - "Max_probe", - "Median_of_probes", - "Mean_of_probes", - "Sum_of_probes")) - - if(is.null(output_folder)) output_folder = paste0(getwd(), "/Feature_set_enrichment_results/") - - if(!dir.exists(output_folder)) { - wrap_msg(paste0("Directory does not yet exist. Creating directory at:", output_folder)) - dir.create(output_folder) - } - - # check for path to GSEA tool - if(is.null(path_to_GSEA)) stop('Path to the GSEA directory needs to be provided \n') - if(!file.exists(path_to_GSEA)) stop('Path to the GSEA directory does not exist \n') - - path_to_GSEA = paste0('"',path_to_GSEA,'"') - - # check for path to GSEA dataset .gmt - if(is.null(GSEA_dataset)) { - warning('Path to a GSEA dataset needs to be provided, only dryrun will work for testing') - dont_run = TRUE - GSEA_dataset = 'test.gmt' - } - - GSEA_dataset = paste0('"',GSEA_dataset,'"') - - # check for GSRA ranked file (path or data.frame) - if(is.null(GSEA_ranked_file)) { - warning('A ranked gene file needs to be provided, only dryrun will work for testing') - dont_run = TRUE - GSEA_ranked_file = 'my_ranked_file.rnk' - } else if(inherits(GSEA_ranked_file, 'character')) { - wrap_msg('The ranked list looks like a path to a file \n') - if(!file.exists(GSEA_ranked_file)) stop('Path to the ranked file does not exist \n') - } else if(inherits(GSEA_ranked_file, 'data.frame')) { - wrap_msg('The ranked list looks like a data.frame \n') - - # write data.frame to temporary folder and use that path - temp_location_path = paste0(tempdir(), '/', 'temp_rankfile.rnk') - data.table::fwrite(x = GSEA_ranked_file, - file = temp_location_path, - sep = '\t', row.names = FALSE, col.names = FALSE) - - GSEA_ranked_file = temp_location_path - - } - - - # make sure all paths and files can be read by java - output_folder = paste0('"',output_folder,'"') - name_analysis_folder = paste0('"',name_analysis_folder,'"') - - # 1. identify operating system - my_os = get_os() - - # 2. create execution path - operation = 'GSEAPreranked' - - if(my_os == 'windows') { - execution_path = paste0(path_to_GSEA,'/','gsea-cli.bat', ' ', operation) - } else { - execution_path = paste0(path_to_GSEA,'/','gsea-cli.sh', ' ', operation) - } - - created_command = sprintf("%s \\ + path_to_GSEA = NULL, + GSEA_dataset = NULL, + GSEA_ranked_file = NULL, + output_folder = NULL, + name_analysis_folder = "my_GSEA_analysis", + collapse = "false", + mode = c( + "Abs_max_of_probes", + "Max_probe", + "Median_of_probes", + "Mean_of_probes", + "Sum_of_probes" + ), + norm = "meandiv", + nperm = 1000, + scoring_scheme = "weighted", + plot_top_x = 20, + set_max = 500, + set_min = 15) { + # set don't run to false as a start + dont_run <- FALSE + + # SYSTEM CHECK FOR JAVA + java_not_installed <- as.logical(system("java -version")) + # returns 0 if java is installed (i.e., command runs successfully), + # 1 otherwise + if (java_not_installed) + stop(wrap_txt("Java must be installed for doFeatureSetEnrichment() to + run. Please install Java: https://www.java.com/en/download/", + errWidth = TRUE)) + + + mode <- match.arg(mode, choices = c( + "Abs_max_of_probes", + "Max_probe", + "Median_of_probes", + "Mean_of_probes", + "Sum_of_probes" + )) + + if (is.null(output_folder)) output_folder <- paste0( + getwd(), "/Feature_set_enrichment_results/") + + if (!dir.exists(output_folder)) { + wrap_msg(paste0("Directory does not yet exist. Creating directory at:", + output_folder)) + dir.create(output_folder) + } + + # check for path to GSEA tool + if (is.null(path_to_GSEA)) + stop("Path to the GSEA directory needs to be provided") + if (!file.exists(path_to_GSEA)) + stop("Path to the GSEA directory does not exist") + + path_to_GSEA <- paste0('"', path_to_GSEA, '"') + + # check for path to GSEA dataset .gmt + if (is.null(GSEA_dataset)) { + warning("Path to a GSEA dataset needs to be provided, only dryrun will + work for testing") + dont_run <- TRUE + GSEA_dataset <- "test.gmt" + } + + GSEA_dataset <- paste0('"', GSEA_dataset, '"') + + # check for GSRA ranked file (path or data.frame) + if (is.null(GSEA_ranked_file)) { + warning("A ranked gene file needs to be provided, only dryrun will work + for testing") + dont_run <- TRUE + GSEA_ranked_file <- "my_ranked_file.rnk" + } else if (inherits(GSEA_ranked_file, "character")) { + message("The ranked list looks like a path to a file") + if (!file.exists(GSEA_ranked_file)) + stop("Path to the ranked file does not exist") + } else if (inherits(GSEA_ranked_file, "data.frame")) { + message("The ranked list looks like a data.frame") + + # write data.frame to temporary folder and use that path + temp_location_path <- paste0(tempdir(), "/", "temp_rankfile.rnk") + data.table::fwrite( + x = GSEA_ranked_file, + file = temp_location_path, + sep = "\t", row.names = FALSE, col.names = FALSE + ) + + GSEA_ranked_file <- temp_location_path + } + + + # make sure all paths and files can be read by java + output_folder <- paste0('"', output_folder, '"') + name_analysis_folder <- paste0('"', name_analysis_folder, '"') + + # 1. identify operating system + my_os <- get_os() + + # 2. create execution path + operation <- "GSEAPreranked" + + if (my_os == "windows") { + execution_path <- paste0( + path_to_GSEA, "/", "gsea-cli.bat", " ", operation) + } else { + execution_path <- paste0( + path_to_GSEA, "/", "gsea-cli.sh", " ", operation) + } + + created_command <- sprintf( + "%s \\ -gmx %s \\ -collapse %s \\ -mode %s \\ @@ -148,28 +178,28 @@ doFeatureSetEnrichment <- function(dryrun = TRUE, -zip_report false \\ -rpt_label %s \\ -out %s", - execution_path, - GSEA_dataset, - collapse, - mode, - norm, - nperm, - GSEA_ranked_file, - scoring_scheme, - plot_top_x, - set_max, - set_min, - name_analysis_folder, - output_folder) - - - if(isTRUE(dryrun) | isTRUE(dont_run)) { - wrap_msg('DRYRUN VERSION \n \n') - print(created_command) - } else { - wrap_msg('START GSEA RUN \n \n') - - system(created_command) - } - + execution_path, + GSEA_dataset, + collapse, + mode, + norm, + nperm, + GSEA_ranked_file, + scoring_scheme, + plot_top_x, + set_max, + set_min, + name_analysis_folder, + output_folder + ) + + + if (isTRUE(dryrun) | isTRUE(dont_run)) { + message("DRYRUN VERSION") + print(created_command) + } else { + message("START GSEA RUN") + + system(created_command) + } } diff --git a/R/general_help.R b/R/general_help.R index 0463fe6da..0412c4fe3 100644 --- a/R/general_help.R +++ b/R/general_help.R @@ -8,28 +8,27 @@ #' @keywords internal #' @return gini coefficient mygini_fun <- function(x, - weights = rep(1,length(x))) { - - # adapted from R package GiniWegNeg - dataset = cbind(x, weights) - ord_x = order(x) - dataset_ord = dataset[ord_x,] - x = dataset_ord[,1] - weights = dataset_ord[,2] - N = sum(weights) - xw = x*weights - C_i = cumsum(weights) - num_1 = sum(xw*C_i) - num_2 = sum(xw) - num_3 = sum(xw*weights) - G_num = (2/N^2)*num_1-(1/N)*num_2-(1/N^2)*num_3 - t_neg = subset(xw, xw<=0) - T_neg = sum(t_neg) - T_pos = sum(xw)+abs(T_neg) - n_RSV = (2*(T_pos+(abs(T_neg)))/N) - mean_RSV = (n_RSV/2) - G_RSV = (1/mean_RSV)*G_num - return(G_RSV) + weights = rep(1, length(x))) { + # adapted from R package GiniWegNeg + dataset <- cbind(x, weights) + ord_x <- order(x) + dataset_ord <- dataset[ord_x, ] + x <- dataset_ord[, 1] + weights <- dataset_ord[, 2] + N <- sum(weights) + xw <- x * weights + C_i <- cumsum(weights) + num_1 <- sum(xw * C_i) + num_2 <- sum(xw) + num_3 <- sum(xw * weights) + G_num <- (2 / N^2) * num_1 - (1 / N) * num_2 - (1 / N^2) * num_3 + t_neg <- subset(xw, xw <= 0) + T_neg <- sum(t_neg) + T_pos <- sum(xw) + abs(T_neg) + n_RSV <- (2 * (T_pos + (abs(T_neg))) / N) + mean_RSV <- (n_RSV / 2) + G_RSV <- (1 / mean_RSV) * G_num + return(G_RSV) } @@ -38,17 +37,16 @@ mygini_fun <- function(x, #' @keywords internal #' @return gini coefficient extended_gini_fun <- function(x, - weights = rep(1, length = length(x)), - minimum_length = 16) { - - if(length(x) < minimum_length) { - difference = minimum_length - length(x) - min_value = min(x) - x = c(x,rep(min_value, difference)) - } + weights = rep(1, length = length(x)), + minimum_length = 16) { + if (length(x) < minimum_length) { + difference <- minimum_length - length(x) + min_value <- min(x) + x <- c(x, rep(min_value, difference)) + } - result <- mygini_fun(x = x, weights = weights) - return(result) + result <- mygini_fun(x = x, weights = weights) + return(result) } @@ -58,116 +56,120 @@ extended_gini_fun <- function(x, #' @name .kmeans_binarize #' @description create binarized scores from a vector using kmeans #' @keywords internal -.kmeans_binarize = function(x, - nstart = 3, - iter.max = 10, - seed = NULL) { - - if(!is.null(seed)) { - on.exit(random_seed(), add = TRUE) - set.seed(seed) - } - sel_gene_km = stats::kmeans(x, centers = 2, nstart = nstart, iter.max = iter.max)$cluster - mean_1 = mean(x[sel_gene_km == 1]) - mean_2 = mean(x[sel_gene_km == 2]) - - if(mean_1 > mean_2) { - mean_1_value = 1 - mean_2_value = 0 - } else { - mean_1_value = 0 - mean_2_value = 1 - } - - sel_gene_bin = x - sel_gene_bin[sel_gene_km == 1] = mean_1_value - sel_gene_bin[sel_gene_km == 2] = mean_2_value - - return(sel_gene_bin) +.kmeans_binarize <- function(x, + nstart = 3, + iter.max = 10, + seed = NULL) { + if (!is.null(seed)) { + on.exit(random_seed(), add = TRUE) + set.seed(seed) + } + sel_gene_km <- stats::kmeans( + x, centers = 2, nstart = nstart, iter.max = iter.max)$cluster + mean_1 <- mean(x[sel_gene_km == 1]) + mean_2 <- mean(x[sel_gene_km == 2]) + + if (mean_1 > mean_2) { + mean_1_value <- 1 + mean_2_value <- 0 + } else { + mean_1_value <- 0 + mean_2_value <- 1 + } + + sel_gene_bin <- x + sel_gene_bin[sel_gene_km == 1] <- mean_1_value + sel_gene_bin[sel_gene_km == 2] <- mean_2_value + return(sel_gene_bin) } #' @title .kmeans_arma_binarize #' @name .kmeans_arma_binarize #' @description create binarized scores from a vector using kmeans_arma #' @keywords internal -.kmeans_arma_binarize = function(x, n_iter = 5, seed = NULL) { - - - if(!is.null(seed)) { - on.exit(random_seed(), add = TRUE) - set.seed(seed) - } - sel_gene_km_res = ClusterR::KMeans_arma(data = as.matrix(x), - clusters = 2, - n_iter = n_iter) - sel_gene_km = ClusterR::predict_KMeans(data = as.matrix(x), - CENTROIDS = sel_gene_km_res) - - mean_1 = mean(x[sel_gene_km == 1]) - mean_2 = mean(x[sel_gene_km == 2]) +.kmeans_arma_binarize <- function(x, n_iter = 5, seed = NULL) { + if (!is.null(seed)) { + on.exit(random_seed(), add = TRUE) + set.seed(seed) + } + sel_gene_km_res <- ClusterR::KMeans_arma( + data = as.matrix(x), + clusters = 2, + n_iter = n_iter + ) + sel_gene_km <- ClusterR::predict_KMeans( + data = as.matrix(x), + CENTROIDS = sel_gene_km_res + ) - if(mean_1 > mean_2) { - mean_1_value = 1 - mean_2_value = 0 - } else { - mean_1_value = 0 - mean_2_value = 1 - } + mean_1 <- mean(x[sel_gene_km == 1]) + mean_2 <- mean(x[sel_gene_km == 2]) - sel_gene_bin = x - sel_gene_bin[sel_gene_km == 1] = mean_1_value - sel_gene_bin[sel_gene_km == 2] = mean_2_value + if (mean_1 > mean_2) { + mean_1_value <- 1 + mean_2_value <- 0 + } else { + mean_1_value <- 0 + mean_2_value <- 1 + } - return(sel_gene_bin) + sel_gene_bin <- x + sel_gene_bin[sel_gene_km == 1] <- mean_1_value + sel_gene_bin[sel_gene_km == 2] <- mean_2_value + return(sel_gene_bin) } #' @title .kmeans_arma_subset_binarize #' @name .kmeans_arma_subset_binarize -#' @description create binarized scores from a subsetted vector using kmeans_arma +#' @description create binarized scores from a subsetted vector using +#' kmeans_arma #' @keywords internal -.kmeans_arma_subset_binarize = function(x, - n_iter = 5, - extreme_nr = 20, - sample_nr = 200, - seed = NULL) { - - length_x = length(x) - - vector_x = sort(x) - first_set = vector_x[1:extreme_nr] - last_set = vector_x[(length_x-(extreme_nr-1)):length_x] - random_set = sample(vector_x[(extreme_nr+1):(length_x-extreme_nr)], size = sample_nr) - testset = c(first_set, last_set, random_set) - - if(!is.null(seed)) { - on.exit(random_seed(), add = TRUE) - set.seed(seed) - } - sel_gene_km_res = ClusterR::KMeans_arma(data = as.matrix(testset), - clusters = 2, - n_iter = n_iter) - sel_gene_km = ClusterR::predict_KMeans(data = as.matrix(x), - CENTROIDS = sel_gene_km_res) - - mean_1 = mean(x[sel_gene_km == 1]) - mean_2 = mean(x[sel_gene_km == 2]) - - if(mean_1 > mean_2) { - mean_1_value = 1 - mean_2_value = 0 - } else { - mean_1_value = 0 - mean_2_value = 1 - } - - sel_gene_bin = x - sel_gene_bin[sel_gene_km == 1] = mean_1_value - sel_gene_bin[sel_gene_km == 2] = mean_2_value - - return(sel_gene_bin) +.kmeans_arma_subset_binarize <- function(x, + n_iter = 5, + extreme_nr = 20, + sample_nr = 200, + seed = NULL) { + length_x <- length(x) + + vector_x <- sort(x) + first_set <- vector_x[1:extreme_nr] + last_set <- vector_x[(length_x - (extreme_nr - 1)):length_x] + random_set <- sample( + vector_x[(extreme_nr + 1):(length_x - extreme_nr)], size = sample_nr) + testset <- c(first_set, last_set, random_set) + + if (!is.null(seed)) { + on.exit(random_seed(), add = TRUE) + set.seed(seed) + } + sel_gene_km_res <- ClusterR::KMeans_arma( + data = as.matrix(testset), + clusters = 2, + n_iter = n_iter + ) + sel_gene_km <- ClusterR::predict_KMeans( + data = as.matrix(x), + CENTROIDS = sel_gene_km_res + ) + + mean_1 <- mean(x[sel_gene_km == 1]) + mean_2 <- mean(x[sel_gene_km == 2]) + if (mean_1 > mean_2) { + mean_1_value <- 1 + mean_2_value <- 0 + } else { + mean_1_value <- 0 + mean_2_value <- 1 + } + + sel_gene_bin <- x + sel_gene_bin[sel_gene_km == 1] <- mean_1_value + sel_gene_bin[sel_gene_km == 2] <- mean_2_value + + return(sel_gene_bin) } @@ -176,50 +178,45 @@ extended_gini_fun <- function(x, #' @name kmeans_binarize_wrapper #' @description wrapper for different binarization functions #' @keywords internal -kmeans_binarize_wrapper = function( - expr_values, - subset_feats = NULL, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - seed = NULL -) { - - - # expression values - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats, ] - } - - # check parameter - kmeans_algo = match.arg( - arg = kmeans_algo, - choices = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset') - ) - - bin_matrix <- switch( - kmeans_algo, - "kmeans" = t_flex(apply( - X = expr_values, MARGIN = 1, FUN = .kmeans_binarize, - nstart = nstart, iter.max = iter_max, seed = seed - )), - "kmeans_arma" = t_flex(apply( - X = expr_values, MARGIN = 1, FUN = .kmeans_arma_binarize, - n_iter = iter_max, seed = seed - )), - "kmeans_arma_subset" = t_flex(apply( - X = expr_values, MARGIN = 1, FUN = .kmeans_arma_subset_binarize, - n_iter = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - seed = seed - )) - ) - - return(bin_matrix) +kmeans_binarize_wrapper <- function( + expr_values, + subset_feats = NULL, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + seed = NULL) { + # expression values + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] + } + + # check parameter + kmeans_algo <- match.arg( + arg = kmeans_algo, + choices = c("kmeans", "kmeans_arma", "kmeans_arma_subset") + ) + bin_matrix <- switch(kmeans_algo, + "kmeans" = t_flex(apply( + X = expr_values, MARGIN = 1, FUN = .kmeans_binarize, + nstart = nstart, iter.max = iter_max, seed = seed + )), + "kmeans_arma" = t_flex(apply( + X = expr_values, MARGIN = 1, FUN = .kmeans_arma_binarize, + n_iter = iter_max, seed = seed + )), + "kmeans_arma_subset" = t_flex(apply( + X = expr_values, MARGIN = 1, FUN = .kmeans_arma_subset_binarize, + n_iter = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + seed = seed + )) + ) + + return(bin_matrix) } @@ -228,16 +225,14 @@ kmeans_binarize_wrapper = function( #' @name .rank_binarize #' @description create binarized scores from a vector using arbitrary rank #' @keywords internal -.rank_binarize = function(x, max_rank = 200) { - - sel_gene_rank = rank(-x, ties.method = 'average') +.rank_binarize <- function(x, max_rank = 200) { + sel_gene_rank <- rank(-x, ties.method = "average") - sel_gene_bin = x - sel_gene_bin[sel_gene_rank <= max_rank] = 1 - sel_gene_bin[sel_gene_rank > max_rank] = 0 - - return(sel_gene_bin) + sel_gene_bin <- x + sel_gene_bin[sel_gene_rank <= max_rank] <- 1 + sel_gene_bin[sel_gene_rank > max_rank] <- 0 + return(sel_gene_bin) } @@ -246,19 +241,19 @@ kmeans_binarize_wrapper = function( #' @name rank_binarize_wrapper #' @description wrapper for rank binarization function #' @keywords internal -rank_binarize_wrapper = function(expr_values, - subset_feats = NULL, - percentage_rank = 30) { - - # expression values - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats, ] - } +rank_binarize_wrapper <- function(expr_values, + subset_feats = NULL, + percentage_rank = 30) { + # expression values + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] + } - max_rank = (ncol(expr_values)/100)*percentage_rank - bin_matrix = t_flex(apply(X = expr_values, MARGIN = 1, FUN = .rank_binarize, max_rank = max_rank)) + max_rank <- (ncol(expr_values) / 100) * percentage_rank + bin_matrix <- t_flex(apply( + X = expr_values, MARGIN = 1, FUN = .rank_binarize, max_rank = max_rank)) - return(bin_matrix) + return(bin_matrix) } @@ -269,91 +264,99 @@ rank_binarize_wrapper = function(expr_values, #' @title convertEnsemblToGeneSymbol #' @name convertEnsemblToGeneSymbol -#' @description This function convert ensembl gene IDs from a matrix to official gene symbols +#' @description This function convert ensembl gene IDs from a matrix to +#' official gene symbols #' @param matrix an expression matrix with ensembl gene IDs as rownames #' @param species species to use for gene symbol conversion #' @return expression matrix with gene symbols as rownames #' @details This function requires that the biomaRt library is installed #' @export -convertEnsemblToGeneSymbol = function(matrix, - species = c('mouse', 'human')) { - - # data.table: set global variable - dupes = mgi_symbol = gene_symbol = ensembl_gene_id = hgnc_symbol = NULL - - package_check('biomaRt', repository = 'Bioc') - - species = match.arg(species, choices = c('mouse', 'human')) - - if(species == 'mouse') { - - # ensembl IDs to change - ensemblsIDS = rownames(matrix) - - # prepare ensembl database - ensembl = biomaRt::useMart("ensembl", - dataset = "mmusculus_gene_ensembl") - gene_names = biomaRt::getBM(attributes= c('mgi_symbol', 'ensembl_gene_id'), - filters = 'ensembl_gene_id', - values = ensemblsIDS, - mart = ensembl) - gene_names_DT = data.table::as.data.table(gene_names) - gene_names_DT[, dupes := duplicated(mgi_symbol)] - gene_names_DT[, gene_symbol := ifelse(any(dupes) == FALSE, mgi_symbol, - ifelse(mgi_symbol == "", ensembl_gene_id, 'temporary')), by = mgi_symbol] - gene_names_DT[, gene_symbol := ifelse(mgi_symbol == '', ensembl_gene_id, gene_symbol)] - gene_names_DT[, gene_symbol := ifelse(gene_symbol == 'temporary', paste0(mgi_symbol,'--', 1:.N), gene_symbol), by = mgi_symbol] - - # filter - matrix = matrix[rownames(matrix) %in% gene_names_DT$ensembl_gene_id, ] - - # create swapping vector - new_symbols = gene_names_DT[['gene_symbol']] - names(new_symbols) = gene_names_DT[['ensembl_gene_id']] - - # replace - new_rownames = new_symbols[rownames(matrix)] - rownames(matrix) = new_rownames - - return(matrix) - - } - - if(species == 'human') { - - # ensembl IDs to change - ensemblsIDS = rownames(matrix) - - # prepare ensembl database - ensembl = biomaRt::useMart("ensembl", - dataset = "hsapiens_gene_ensembl") - gene_names = biomaRt::getBM(attributes= c('hgnc_symbol', 'ensembl_gene_id'), - filters = 'ensembl_gene_id', - values = ensemblsIDS, - mart = ensembl) - gene_names_DT = data.table::as.data.table(gene_names) - gene_names_DT[, dupes := duplicated(hgnc_symbol)] - gene_names_DT[, gene_symbol := ifelse(any(dupes) == FALSE, hgnc_symbol, - ifelse(hgnc_symbol == "", ensembl_gene_id, 'temporary')), by = hgnc_symbol] - gene_names_DT[, gene_symbol := ifelse(hgnc_symbol == '', ensembl_gene_id, gene_symbol)] - gene_names_DT[, gene_symbol := ifelse(gene_symbol == 'temporary', paste0(hgnc_symbol,'--', 1:.N), gene_symbol), by = hgnc_symbol] +convertEnsemblToGeneSymbol <- function(matrix, + species = c("mouse", "human")) { + # data.table: set global variable + dupes <- mgi_symbol <- gene_symbol <- ensembl_gene_id <- hgnc_symbol <- NULL - # filter - matrix = matrix[rownames(matrix) %in% gene_names_DT$ensembl_gene_id, ] + package_check("biomaRt", repository = "Bioc") - # create swapping vector - new_symbols = gene_names_DT[['gene_symbol']] - names(new_symbols) = gene_names_DT[['ensembl_gene_id']] + species <- match.arg(species, choices = c("mouse", "human")) - # replace - new_rownames = new_symbols[rownames(matrix)] - rownames(matrix) = new_rownames + if (species == "mouse") { + # ensembl IDs to change + ensemblsIDS <- rownames(matrix) - return(matrix) - - } + # prepare ensembl database + ensembl <- biomaRt::useMart("ensembl", + dataset = "mmusculus_gene_ensembl" + ) + gene_names <- biomaRt::getBM( + attributes = c("mgi_symbol", "ensembl_gene_id"), + filters = "ensembl_gene_id", + values = ensemblsIDS, + mart = ensembl + ) + gene_names_DT <- data.table::as.data.table(gene_names) + gene_names_DT[, dupes := duplicated(mgi_symbol)] + gene_names_DT[, gene_symbol := ifelse(any(dupes) == FALSE, mgi_symbol, + ifelse(mgi_symbol == "", ensembl_gene_id, "temporary") + ), by = mgi_symbol] + gene_names_DT[, gene_symbol := ifelse( + mgi_symbol == "", ensembl_gene_id, gene_symbol)] + gene_names_DT[, gene_symbol := ifelse( + gene_symbol == "temporary", + paste0(mgi_symbol, "--", 1:.N), gene_symbol), by = mgi_symbol] + + # filter + matrix <- matrix[rownames(matrix) %in% gene_names_DT$ensembl_gene_id, ] + + # create swapping vector + new_symbols <- gene_names_DT[["gene_symbol"]] + names(new_symbols) <- gene_names_DT[["ensembl_gene_id"]] + + # replace + new_rownames <- new_symbols[rownames(matrix)] + rownames(matrix) <- new_rownames + + return(matrix) + } + if (species == "human") { + # ensembl IDs to change + ensemblsIDS <- rownames(matrix) + # prepare ensembl database + ensembl <- biomaRt::useMart("ensembl", + dataset = "hsapiens_gene_ensembl" + ) + gene_names <- biomaRt::getBM( + attributes = c("hgnc_symbol", "ensembl_gene_id"), + filters = "ensembl_gene_id", + values = ensemblsIDS, + mart = ensembl + ) + gene_names_DT <- data.table::as.data.table(gene_names) + gene_names_DT[, dupes := duplicated(hgnc_symbol)] + gene_names_DT[, gene_symbol := ifelse(any(dupes) == FALSE, hgnc_symbol, + ifelse(hgnc_symbol == "", ensembl_gene_id, "temporary") + ), by = hgnc_symbol] + gene_names_DT[, gene_symbol := ifelse( + hgnc_symbol == "", ensembl_gene_id, gene_symbol)] + gene_names_DT[, gene_symbol := ifelse( + gene_symbol == "temporary", + paste0(hgnc_symbol, "--", 1:.N), gene_symbol), by = hgnc_symbol] + + # filter + matrix <- matrix[rownames(matrix) %in% gene_names_DT$ensembl_gene_id, ] + + # create swapping vector + new_symbols <- gene_names_DT[["gene_symbol"]] + names(new_symbols) <- gene_names_DT[["ensembl_gene_id"]] + + # replace + new_rownames <- new_symbols[rownames(matrix)] + rownames(matrix) <- new_rownames + + return(matrix) + } } @@ -373,40 +376,42 @@ convertEnsemblToGeneSymbol = function(matrix, #' @title Polygon creation and smoothing for parallel #' @name gpoly_from_dfr_smoothed_wrapped #' @keywords internal -gpoly_from_dfr_smoothed_wrapped = function( - segmdfr, - name = 'cell', - calc_centroids = FALSE, - smooth_polygons = FALSE, - vertices = 20L, - k = 3L, - set_neg_to_zero = TRUE, - skip_eval_dfr = FALSE, - copy_dt = TRUE, - verbose = TRUE -) { - - gpoly = createGiottoPolygonsFromDfr( - segmdfr = segmdfr, - name = name, - calc_centroids = FALSE, - skip_eval_dfr = skip_eval_dfr, - copy_dt = copy_dt, - verbose = verbose - ) - if(isTRUE(smooth_polygons)) gpoly = smoothGiottoPolygons( - gpolygon = gpoly, - vertices = vertices, - k = k, - set_neg_to_zero = set_neg_to_zero - ) - if(isTRUE(calc_centroids)) gpoly = centroids(gpoly, append_gpolygon = TRUE) - - slot(gpoly, 'spatVector') = terra::wrap(slot(gpoly, 'spatVector')) - if(isTRUE(calc_centroids)) { - slot(gpoly, 'spatVectorCentroids') = terra::wrap(slot(gpoly, 'spatVectorCentroids')) - } - return(gpoly) +gpoly_from_dfr_smoothed_wrapped <- function( + segmdfr, + name = "cell", + calc_centroids = FALSE, + smooth_polygons = FALSE, + vertices = 20L, + k = 3L, + set_neg_to_zero = TRUE, + skip_eval_dfr = FALSE, + copy_dt = TRUE, + verbose = TRUE) { + gpoly <- createGiottoPolygonsFromDfr( + segmdfr = segmdfr, + name = name, + calc_centroids = FALSE, + skip_eval_dfr = skip_eval_dfr, + copy_dt = copy_dt, + verbose = verbose + ) + if (isTRUE(smooth_polygons)) { + gpoly <- smoothGiottoPolygons( + gpolygon = gpoly, + vertices = vertices, + k = k, + set_neg_to_zero = set_neg_to_zero + ) + } + if (isTRUE(calc_centroids)) gpoly <- centroids( + gpoly, append_gpolygon = TRUE) + + slot(gpoly, "spatVector") <- terra::wrap(slot(gpoly, "spatVector")) + if (isTRUE(calc_centroids)) { + slot(gpoly, "spatVectorCentroids") <- terra::wrap( + slot(gpoly, "spatVectorCentroids")) + } + return(gpoly) } @@ -415,88 +420,95 @@ gpoly_from_dfr_smoothed_wrapped = function( #' @title get10Xmatrix #' @name get10Xmatrix -#' @description This function creates an expression matrix from a 10X structured folder +#' @description This function creates an expression matrix from a 10X +#' structured folder #' @param path_to_data path to the 10X folder -#' @param gene_column_index which column from the features or genes .tsv file to use for row ids +#' @param gene_column_index which column from the features or genes .tsv file +#' to use for row ids #' @param remove_zero_rows removes rows with sum equal to zero -#' @param split_by_type split into multiple matrices based on 3rd column of features.tsv(.gz) +#' @param split_by_type split into multiple matrices based on 3rd column of +#' features.tsv(.gz) #' @return sparse expression matrix from 10X -#' @details A typical 10X folder is named raw_feature_bc_matrix or filtered_feature_bc_matrix and it has 3 files: +#' @details A typical 10X folder is named raw_feature_bc_matrix or +#' filtered_feature_bc_matrix and it has 3 files: #' \itemize{ #' \item{barcodes.tsv(.gz)} #' \item{features.tsv(.gz) or genes.tsv(.gz)} #' \item{matrix.mtx(.gz)} #' } -#' By default the first column of the features or genes .tsv file will be used, however if multiple -#' annotations are provided (e.g. ensembl gene ids and gene symbols) the user can select another column. +#' By default the first column of the features or genes .tsv file will be used, +#' however if multiple +#' annotations are provided (e.g. ensembl gene ids and gene symbols) the user +#' can select another column. #' @export -get10Xmatrix = function(path_to_data, - gene_column_index = 1, - remove_zero_rows = TRUE, - split_by_type = TRUE) { - - # data.table variables - total = gene_symbol = gene_id = gene_id_num = cell_id = cell_id_num = sort_gene_id_num = NULL - - # data directory - files_10X = list.files(path_to_data) - - # get barcodes and create vector - barcodes_file = grep(files_10X, pattern = 'barcodes', value = T) - barcodesDT = data.table::fread(input = paste0(path_to_data,'/',barcodes_file), header = F) - barcodes_vec = barcodesDT$V1 - names(barcodes_vec) = 1:nrow(barcodesDT) - - # get features and create vector - features_file = grep(files_10X, pattern = 'features|genes', value = T) - featuresDT = data.table::fread(input = paste0(path_to_data,'/',features_file), header = F) - - g_name = colnames(featuresDT)[gene_column_index] - ## convert ensembl gene id to gene symbol ## - ## TODO - - featuresDT[, total := .N, by = get(g_name)] - featuresDT[, gene_symbol := ifelse(total > 1, paste0(get(g_name),'--',1:.N), get(g_name)), by = get(g_name)] - features_vec = featuresDT$gene_symbol - names(features_vec) = 1:nrow(featuresDT) - - # get matrix - matrix_file = grep(files_10X, pattern = 'matrix', value = T) - MMmatrix = Matrix::readMM(paste0(path_to_data,'/',matrix_file)) - rownames(MMmatrix) = features_vec - colnames(MMmatrix) = barcodes_vec - - - # Split by type of feature (features.tsv 3rd col) - feat_classes = unique(featuresDT$V3) - if(length(feat_classes) > 1 && isTRUE(split_by_type)) { - result_list = list() - - for(fclass in feat_classes) { - result_list[[fclass]] = MMmatrix[featuresDT$V3 == fclass, ] - } +get10Xmatrix <- function(path_to_data, + gene_column_index = 1, + remove_zero_rows = TRUE, + split_by_type = TRUE) { + # data.table variables + total <- gene_symbol <- gene_id <- gene_id_num <- cell_id <- + cell_id_num <- sort_gene_id_num <- NULL + + # data directory + files_10X <- list.files(path_to_data) + + # get barcodes and create vector + barcodes_file <- grep(files_10X, pattern = "barcodes", value = TRUE) + barcodesDT <- data.table::fread( + input = paste0(path_to_data, "/", barcodes_file), header = FALSE) + barcodes_vec <- barcodesDT$V1 + names(barcodes_vec) <- 1:nrow(barcodesDT) + + # get features and create vector + features_file <- grep(files_10X, pattern = "features|genes", value = TRUE) + featuresDT <- data.table::fread( + input = paste0(path_to_data, "/", features_file), header = FALSE) + + g_name <- colnames(featuresDT)[gene_column_index] + ## convert ensembl gene id to gene symbol ## + ## TODO + + featuresDT[, total := .N, by = get(g_name)] + featuresDT[, gene_symbol := ifelse( + total > 1, paste0(get(g_name), "--", 1:.N), + get(g_name)), by = get(g_name)] + features_vec <- featuresDT$gene_symbol + names(features_vec) <- 1:nrow(featuresDT) + + # get matrix + matrix_file <- grep(files_10X, pattern = "matrix", value = TRUE) + MMmatrix <- Matrix::readMM(paste0(path_to_data, "/", matrix_file)) + rownames(MMmatrix) <- features_vec + colnames(MMmatrix) <- barcodes_vec + + + # Split by type of feature (features.tsv 3rd col) + feat_classes <- unique(featuresDT$V3) + if (length(feat_classes) > 1 && isTRUE(split_by_type)) { + result_list <- list() + + for (fclass in feat_classes) { + result_list[[fclass]] <- MMmatrix[featuresDT$V3 == fclass, ] + } - if(isTRUE(remove_zero_rows)) { - result_list = lapply(result_list, function(MMmatrix) { - rowsums_result = rowSums_flex(MMmatrix) - rowsums_bool = rowsums_result != 0 - MMmatrix = MMmatrix[rowsums_bool, ] - }) - } + if (isTRUE(remove_zero_rows)) { + result_list <- lapply(result_list, function(MMmatrix) { + rowsums_result <- rowSums_flex(MMmatrix) + rowsums_bool <- rowsums_result != 0 + MMmatrix <- MMmatrix[rowsums_bool, ] + }) + } - return(result_list) - } else { + return(result_list) + } else { + if (remove_zero_rows == TRUE) { + rowsums_result <- rowSums_flex(MMmatrix) + rowsums_bool <- rowsums_result != 0 + MMmatrix <- MMmatrix[rowsums_bool, ] + } - if(remove_zero_rows == TRUE) { - rowsums_result = rowSums_flex(MMmatrix) - rowsums_bool = rowsums_result != 0 - MMmatrix = MMmatrix[rowsums_bool, ] + return(MMmatrix) } - - return(MMmatrix) - - } - } @@ -505,126 +517,128 @@ get10Xmatrix = function(path_to_data, #' @title get10Xmatrix_h5 #' @name get10Xmatrix_h5 -#' @description This function creates an expression matrix from a 10X h5 file path +#' @description This function creates an expression matrix from a 10X h5 file +#' path #' @param path_to_data path to the 10X .h5 file -#' @param gene_ids use gene symbols (default) or ensembl ids for the gene expression matrix +#' @param gene_ids use gene symbols (default) or ensembl ids for the gene +#' expression matrix #' @inheritParams get10Xmatrix #' @return (list of) sparse expression matrix from 10X -#' @details If the .h5 10x file has multiple classes of features (e.g. expression vs QC -#' probes) or modalities (e.g. RNA and protein), and \code{split_by_type} param is \code{TRUE}, -#' multiple matrices will be returned +#' @details If the .h5 10x file has multiple classes of features +#' (e.g. expression vs QC probes) or modalities (e.g. RNA and protein), and +#' \code{split_by_type} param is \code{TRUE}, multiple matrices will be returned #' @export -get10Xmatrix_h5 = function(path_to_data, - gene_ids = c('symbols', 'ensembl'), - remove_zero_rows = TRUE, - split_by_type = TRUE) { - - ## function inspired by and modified from the VISION package - ## see read_10x_h5_v3 in https://github.com/YosefLab/VISION/blob/master/R/Utilities.R - - # verify if optional package is installed - package_check(pkg_name = "hdf5r", repository = "CRAN") - - # select parameter - gene_ids = match.arg(gene_ids, choices = c('symbols', 'ensembl')) - - h5 = hdf5r::H5File$new(path_to_data) - - tryCatch({ - - # list objects part of the h5 file - # hdf5r::list.objects(h5) - - # get root folder name e.g. 'matrix' - root <- names(h5) - root <- root[1] - - # extraction information - data <- h5[[paste0(root, "/data")]][] - data <- as.numeric(data) - - barcodes = h5[[paste0(root, "/barcodes")]][] - feature_tag_keys = h5[[paste0(root, "/features/_all_tag_keys")]][] - feature_types = h5[[paste0(root, "/features/feature_type")]][] - genome = unique(h5[[paste0(root, "/features/genome")]][]) - feature_id = h5[[paste0(root, "/features/id")]][] - feature_names = h5[[paste0(root, "/features/name")]][] - indices = h5[[paste0(root, "/indices")]][] - indptr = h5[[paste0(root, "/indptr")]][] - data_shape = h5[[paste0(root, "/shape")]][] - - # create a feature data.table - features_dt = data.table::data.table( - 'id' = feature_id, - 'name' = feature_names, - 'feature_type' = feature_types, - 'genome' = genome +get10Xmatrix_h5 <- function(path_to_data, + gene_ids = c("symbols", "ensembl"), + remove_zero_rows = TRUE, + split_by_type = TRUE) { + ## function inspired by and modified from the VISION package + ## see read_10x_h5_v3 in + ## https://github.com/YosefLab/VISION/blob/master/R/Utilities.R + + # verify if optional package is installed + package_check(pkg_name = "hdf5r", repository = "CRAN") + + # select parameter + gene_ids <- match.arg(gene_ids, choices = c("symbols", "ensembl")) + + h5 <- hdf5r::H5File$new(path_to_data) + + tryCatch( + { + # list objects part of the h5 file + # hdf5r::list.objects(h5) + + # get root folder name e.g. 'matrix' + root <- names(h5) + root <- root[1] + + # extraction information + data <- h5[[paste0(root, "/data")]][] + data <- as.numeric(data) + + barcodes <- h5[[paste0(root, "/barcodes")]][] + feature_tag_keys <- h5[[paste0(root, "/features/_all_tag_keys")]][] + feature_types <- h5[[paste0(root, "/features/feature_type")]][] + genome <- unique(h5[[paste0(root, "/features/genome")]][]) + feature_id <- h5[[paste0(root, "/features/id")]][] + feature_names <- h5[[paste0(root, "/features/name")]][] + indices <- h5[[paste0(root, "/indices")]][] + indptr <- h5[[paste0(root, "/indptr")]][] + data_shape <- h5[[paste0(root, "/shape")]][] + + # create a feature data.table + features_dt <- data.table::data.table( + "id" = feature_id, + "name" = feature_names, + "feature_type" = feature_types, + "genome" = genome + ) + }, + finally = { + h5$close_all() + } ) - }, - finally = { - h5$close_all() - }) - - # create uniq name symbols - # duplicate gene symbols will be given a suffix '_1', '_2', ... - - # data.table variables - nr_name = name = uniq_name = NULL + # create uniq name symbols + # duplicate gene symbols will be given a suffix '_1', '_2', ... - features_dt[, nr_name := 1:.N, by = name] - features_dt[, uniq_name := ifelse(nr_name == 1, name, paste0(name, '_', (nr_name-1)))] + # data.table variables + nr_name <- name <- uniq_name <- NULL + features_dt[, nr_name := 1:.N, by = name] + features_dt[, uniq_name := ifelse( + nr_name == 1, name, paste0(name, "_", (nr_name - 1)))] - # dimension names - dimnames = list(feature_id, barcodes) - sparsemat = Matrix::sparseMatrix(i = indices + 1, - p = indptr, - x = data, - dims = data_shape, - dimnames = dimnames) + # dimension names + dimnames <- list(feature_id, barcodes) - # multiple feature classes (e.g. gene vs diagnostic or even modalities?) - if(isTRUE(split_by_type)) { - result_list = list() - - for(fclass in unique(feature_types)) { - - result_list[[fclass]] = sparsemat[features_dt$feature_type == fclass, ] + sparsemat <- Matrix::sparseMatrix( + i = indices + 1, + p = indptr, + x = data, + dims = data_shape, + dimnames = dimnames + ) - # change names to gene symbols if it's expression - if(fclass == 'Gene Expression' & gene_ids == 'symbols') { + # multiple feature classes (e.g. gene vs diagnostic or even modalities?) + if (isTRUE(split_by_type)) { + result_list <- list() - conv_vector = features_dt$uniq_name - names(conv_vector) = features_dt$id + for (fclass in unique(feature_types)) { + result_list[[fclass]] <- sparsemat[ + features_dt$feature_type == fclass, ] - current_names = rownames(result_list[[fclass]]) - new_names = conv_vector[current_names] - rownames(result_list[[fclass]]) = new_names - } - } + # change names to gene symbols if it's expression + if (fclass == "Gene Expression" & gene_ids == "symbols") { + conv_vector <- features_dt$uniq_name + names(conv_vector) <- features_dt$id - if(isTRUE(remove_zero_rows)) { - result_list = lapply(result_list, function(sparsemat) { - rowsums_result = rowSums_flex(sparsemat) - rowsums_bool = rowsums_result != 0 - sparsemat = sparsemat[rowsums_bool, ] - }) - } + current_names <- rownames(result_list[[fclass]]) + new_names <- conv_vector[current_names] + rownames(result_list[[fclass]]) <- new_names + } + } - return(result_list) + if (isTRUE(remove_zero_rows)) { + result_list <- lapply(result_list, function(sparsemat) { + rowsums_result <- rowSums_flex(sparsemat) + rowsums_bool <- rowsums_result != 0 + sparsemat <- sparsemat[rowsums_bool, ] + }) + } - } else { + return(result_list) + } else { + if (isTRUE(remove_zero_rows)) { + rowsums_result <- rowSums_flex(sparsemat) + rowsums_bool <- rowsums_result != 0 + sparsemat <- sparsemat[rowsums_bool, ] + } - if(isTRUE(remove_zero_rows)) { - rowsums_result = rowSums_flex(sparsemat) - rowsums_bool = rowsums_result != 0 - sparsemat = sparsemat[rowsums_bool, ] + return(list("Gene Expression" = sparsemat)) } - - return(list('Gene Expression' = sparsemat)) - } } @@ -639,11 +653,12 @@ get10Xmatrix_h5 = function(path_to_data, #' @title readPolygonFilesVizgenHDF5 #' @name readPolygonFilesVizgenHDF5_old -#' @description Read and create polygons for all cells, or for only selected FOVs. +#' @description Read and create polygons for all cells, or for only selected +#' FOVs. #' @param boundaries_path path to the cell_boundaries folder #' @param fovs subset of fovs to use -#' @param custom_polygon_names a character vector to provide custom polygon names -#' (optional) +#' @param custom_polygon_names a character vector to provide custom polygon +#' names (optional) #' @param polygon_feat_types a vector containing the polygon feature types #' @param flip_x_axis flip x axis of polygon coordinates (multiply by -1) #' @param flip_y_axis flip y axis of polygon coordinates (multiply by -1) @@ -654,156 +669,161 @@ get10Xmatrix_h5 = function(path_to_data, #' @param cores cores to use #' @param verbose be verbose #' @seealso \code{\link{smoothGiottoPolygons}} -#' @details Set H5Fopen_flags to "H5F_ACC_RDONLY" if you encounter permission issues. +#' @details Set H5Fopen_flags to "H5F_ACC_RDONLY" if you encounter permission +#' issues. #' @export -readPolygonFilesVizgenHDF5_old = function(boundaries_path, - fovs = NULL, - polygon_feat_types = 0:6, - custom_polygon_names = NULL, - flip_x_axis = F, - flip_y_axis = F, - smooth_polygons = TRUE, - smooth_vertices = 60, - set_neg_to_zero = FALSE, - H5Fopen_flags = "H5F_ACC_RDWR", - cores = NA, - verbose = TRUE) { - - # necessary pkgs - package_check(pkg_name = 'rhdf5', repository = 'Bioc') - - cores = determine_cores(cores) - - # data.table vars - x = y = cell_id = file_id = my_id = NULL - - # prepare poly feat names - poly_feat_names = paste0('z', polygon_feat_types) - poly_feat_indexes = paste0('zIndex_', polygon_feat_types) - - # provide your own custom names - if(!is.null(custom_polygon_names)) { - - if(!is.character(custom_polygon_names)) { - stop(wrap_txt('If custom_polygon_names are provided, it needs to be a character vector')) +readPolygonFilesVizgenHDF5_old <- function(boundaries_path, + fovs = NULL, + polygon_feat_types = 0:6, + custom_polygon_names = NULL, + flip_x_axis = FALSE, + flip_y_axis = FALSE, + smooth_polygons = TRUE, + smooth_vertices = 60, + set_neg_to_zero = FALSE, + H5Fopen_flags = "H5F_ACC_RDWR", + cores = NA, + verbose = TRUE) { + # necessary pkgs + package_check(pkg_name = "rhdf5", repository = "Bioc") + + cores <- determine_cores(cores) + + # data.table vars + x <- y <- cell_id <- file_id <- my_id <- NULL + + # prepare poly feat names + poly_feat_names <- paste0("z", polygon_feat_types) + poly_feat_indexes <- paste0("zIndex_", polygon_feat_types) + + # provide your own custom names + if (!is.null(custom_polygon_names)) { + if (!is.character(custom_polygon_names)) { + stop(wrap_txt("If custom_polygon_names are provided, it needs to + be a character vector")) + } + + if (length(custom_polygon_names) != length(poly_feat_names)) { + stop(wrap_txt("length of custom names need to be same as + polygon_feat_types")) + } else { + poly_feat_names <- custom_polygon_names + } } - if(length(custom_polygon_names) != length(poly_feat_names)) { - stop(wrap_txt('length of custom names need to be same as polygon_feat_types')) + if (isTRUE(verbose)) wrap_msg("Reading from:", boundaries_path) + # list all files in the folder + hdf5_boundary_list <- list.files(full.names = TRUE, boundaries_path) + # only load subset of files if fov is given + if (!is.null(fovs)) { + selected_hdf5s <- paste0("feature_data_", fovs, ".hdf5") + selected_hdf5s_concatenated <- paste0(selected_hdf5s, collapse = "|") + hdf5_boundary_selected_list <- grep( + selected_hdf5s_concatenated, x = hdf5_boundary_list, value = TRUE) } else { - poly_feat_names = custom_polygon_names + hdf5_boundary_selected_list <- hdf5_boundary_list } - } - - if(isTRUE(verbose)) wrap_msg('Reading from:', boundaries_path) - # list all files in the folder - hdf5_boundary_list = list.files(full.names = TRUE, boundaries_path) - # only load subset of files if fov is given - if(!is.null(fovs)) { - - selected_hdf5s = paste0('feature_data_', fovs, '.hdf5') - selected_hdf5s_concatenated = paste0(selected_hdf5s, collapse = '|') - hdf5_boundary_selected_list = grep(selected_hdf5s_concatenated, x = hdf5_boundary_list, value = TRUE) - - } else { - hdf5_boundary_selected_list = hdf5_boundary_list - } - - if(isTRUE(verbose)) wrap_msg('finished listing .hdf5 files - start extracting .hdf5 information') - - # open selected polygon files - hdf5_list_length = length(hdf5_boundary_selected_list) - - # append data from all FOVs to single list - init = proc.time() - progressr::with_progress({ - pb = progressr::progressor(along = hdf5_boundary_selected_list) - read_list = lapply_flex(seq_along(hdf5_boundary_selected_list), - cores = cores, - future.packages = c('rhdf5', 'Rhdf5lib'), - function(bound_i) { - - # get feature data - read_file = rhdf5::H5Fopen(hdf5_boundary_selected_list[[bound_i]][[1]], flags = H5Fopen_flags) - fov_info = read_file$featuredata - - # update progress - if(verbose) print(basename(hdf5_boundary_selected_list[[bound_i]])) - elapsed = (proc.time() - init)[[3L]] - step_time = elapsed/bound_i - est = (hdf5_list_length * step_time) - elapsed - pb(message = c('// E:', time_format(elapsed), '| R:', time_format(est))) - rhdf5::H5Fclose(read_file) - return(fov_info) + + if (isTRUE(verbose)) + wrap_msg("finished listing .hdf5 files start extracting .hdf5 + information") + + # open selected polygon files + hdf5_list_length <- length(hdf5_boundary_selected_list) + + # append data from all FOVs to single list + init <- proc.time() + progressr::with_progress({ + pb <- progressr::progressor(along = hdf5_boundary_selected_list) + read_list <- lapply_flex(seq_along(hdf5_boundary_selected_list), + cores = cores, + future.packages = c("rhdf5", "Rhdf5lib"), + function(bound_i) { + # get feature data + read_file <- rhdf5::H5Fopen( + hdf5_boundary_selected_list[[bound_i]][[1]], + flags = H5Fopen_flags) + fov_info <- read_file$featuredata + + # update progress + if (verbose) + print(basename(hdf5_boundary_selected_list[[bound_i]])) + elapsed <- (proc.time() - init)[[3L]] + step_time <- elapsed / bound_i + est <- (hdf5_list_length * step_time) - elapsed + pb(message = c( + "// E:", time_format(elapsed), "| R:", time_format(est))) + rhdf5::H5Fclose(read_file) + return(fov_info) + } + ) }) - }) - - # # combine to FOV data single list - read_list = Reduce('append', read_list) - cell_names = names(read_list) - - - # extract values for each z index and cell from read_list - result_list = lapply_flex(seq_along(poly_feat_indexes), cores = cores, function(z_i) { - # Reduce('append', lapply_flex(seq_along(read_list), cores = cores, function(bound_i) { - # cell_names = names(read_list[[bound_i]]) - # lapply_flex(seq_along(read_list[[bound_i]]), cores = cores, function(cell_i) { - lapply_flex(seq_along(read_list), cores = cores, function(cell_i) { - # singlearray = read_list[[bound_i]][[cell_i]][[poly_feat_indexes[z_i]]]$p_0$coordinates - singlearray = read_list[[cell_i]][[poly_feat_indexes[z_i]]]$p_0$coordinates - cell_name = cell_names[[cell_i]] - if(!is.null(singlearray)) { - singlearraydt = data.table::as.data.table(t_flex(as.matrix(singlearray[,,1]))) - data.table::setnames(singlearraydt, old = c('V1', 'V2'), new = c('x', 'y')) - if(flip_x_axis) singlearraydt[, x := -1 * x] - if(flip_y_axis) singlearraydt[, y := -1 * y] - - # singlearraydt[, file_id := paste0('file', bound_i)] - singlearraydt[, cell_id := cell_name] - # singlearraydt[, my_id := paste0('cell', cell_i)] - } - }) - # })) - }) - result_list_rbind = lapply_flex(seq_along(result_list), cores = cores, function(z_i) { - data.table::rbindlist(result_list[[z_i]]) - }) - - - - if(isTRUE(verbose)) wrap_msg('finished extracting .hdf5 files - start creating polygons') - - - # create Giotto polygons and add them to gobject - # smooth_cell_polygons_list = list() - progressr::with_progress({ - pb = progressr::progressor(along = result_list_rbind) - smooth_cell_polygons_list = lapply_flex(seq_along(result_list_rbind), cores = cores, function(i) { - dfr_subset = result_list_rbind[[i]][,.(x, y, cell_id)] - cell_polygons = createGiottoPolygonsFromDfr(segmdfr = dfr_subset, - name = poly_feat_names[i], - verbose = verbose) - - pb(message = poly_feat_names[i]) - - if(smooth_polygons == TRUE) { - return(smoothGiottoPolygons(cell_polygons, - vertices = smooth_vertices, - set_neg_to_zero = set_neg_to_zero)) - } else { - return(cell_polygons) - } + + # # combine to FOV data single list + read_list <- Reduce("append", read_list) + cell_names <- names(read_list) + + + # extract values for each z index and cell from read_list + result_list <- lapply_flex( + seq_along(poly_feat_indexes), cores = cores, function(z_i) { + lapply_flex(seq_along(read_list), cores = cores, function(cell_i) { + singlearray <- read_list[[cell_i]][[ + poly_feat_indexes[z_i]]]$p_0$coordinates + cell_name <- cell_names[[cell_i]] + if (!is.null(singlearray)) { + singlearraydt <- data.table::as.data.table(t_flex( + as.matrix(singlearray[, , 1]))) + data.table::setnames( + singlearraydt, old = c("V1", "V2"), new = c("x", "y")) + if (flip_x_axis) singlearraydt[, x := -1 * x] + if (flip_y_axis) singlearraydt[, y := -1 * y] + + singlearraydt[, cell_id := cell_name] + } + }) + }) + result_list_rbind <- lapply_flex( + seq_along(result_list), cores = cores, function(z_i) { + data.table::rbindlist(result_list[[z_i]]) }) - }) - # TODO: add spatial centroids - # needs to happen after smoothing to be correct - return(smooth_cell_polygons_list) + if (isTRUE(verbose)) + wrap_msg("finished extracting .hdf5 files start creating polygons") + + + # create Giotto polygons and add them to gobject + progressr::with_progress({ + pb <- progressr::progressor(along = result_list_rbind) + smooth_cell_polygons_list <- lapply_flex(seq_along(result_list_rbind), + cores = cores, function(i) { + dfr_subset <- result_list_rbind[[i]][, .(x, y, cell_id)] + cell_polygons <- createGiottoPolygonsFromDfr( + segmdfr = dfr_subset, + name = poly_feat_names[i], + verbose = verbose + ) + + pb(message = poly_feat_names[i]) + + if (smooth_polygons == TRUE) { + return(smoothGiottoPolygons(cell_polygons, + vertices = smooth_vertices, + set_neg_to_zero = set_neg_to_zero + )) + } else { + return(cell_polygons) + } + }) + }) + + + # TODO: add spatial centroids + # needs to happen after smoothing to be correct + return(smooth_cell_polygons_list) } @@ -812,15 +832,15 @@ readPolygonFilesVizgenHDF5_old = function(boundaries_path, #' @title readPolygonFilesVizgenHDF5 #' @name readPolygonFilesVizgenHDF5 #' @description Read polygon info for all cells or for only selected FOVs from -#' Vizgen HDF5 files. Data is returned as a list of giottoPolygons or data.tables -#' of the requested z indices. +#' Vizgen HDF5 files. Data is returned as a list of giottoPolygons or +#' data.tables of the requested z indices. #' @param boundaries_path path to the cell_boundaries folder #' @param fovs subset of fovs to use #' @param z_indices z indices of polygons to use #' @param segm_to_use segmentation results to use (usually = 1. Depends on if #' alternative segmentations were generated) -#' @param custom_polygon_names a character vector to provide custom polygon names -#' (optional) +#' @param custom_polygon_names a character vector to provide custom polygon +#' names (optional) #' @param polygon_feat_types deprecated. Use \code{z_indices} #' @param flip_x_axis flip x axis of polygon coordinates (multiply by -1) #' @param flip_y_axis flip y axis of polygon coordinates (multiply by -1) @@ -830,310 +850,328 @@ readPolygonFilesVizgenHDF5_old = function(boundaries_path, #' @param calc_centroids calculate centroids (default = FALSE) #' @param H5Fopen_flags see \code{\link[rhdf5]{H5Fopen}} for more details #' @param cores cores to use -#' @param create_gpoly_parallel (default = TRUE) Whether to run gpoly creation in -#' parallel +#' @param create_gpoly_parallel (default = TRUE) Whether to run gpoly creation +#' in parallel #' @param create_gpoly_bin (Optional, default = FALSE) Parallelization option. -#' Accepts integer values as an binning size when generating giottoPolygon objects +#' Accepts integer values as an binning size when generating giottoPolygon +#' objects #' @param verbose be verbose #' @param output whether to return as list of giottoPolygon or data.table #' @seealso \code{\link{smoothGiottoPolygons}} -#' @details Set H5Fopen_flags to "H5F_ACC_RDONLY" if you encounter permission issues. +#' @details Set H5Fopen_flags to "H5F_ACC_RDONLY" if you encounter permission +#' issues. #' @export -readPolygonFilesVizgenHDF5 = function(boundaries_path, - fovs = NULL, - z_indices = 1L:7L, - segm_to_use = 1L, - custom_polygon_names = NULL, - flip_x_axis = FALSE, - flip_y_axis = TRUE, - calc_centroids = FALSE, - smooth_polygons = TRUE, - smooth_vertices = 60L, - set_neg_to_zero = FALSE, - H5Fopen_flags = "H5F_ACC_RDWR", - cores = determine_cores(), - create_gpoly_parallel = TRUE, - create_gpoly_bin = FALSE, - verbose = TRUE, - output = c('giottoPolygon', 'data.table'), - polygon_feat_types = NULL) { - - # necessary pkgs - package_check(pkg_name = 'rhdf5', repository = 'Bioc') - - output = match.arg(output, choices = c('giottoPolygon', 'data.table')) - - # deprecation - if(!is.null(polygon_feat_types)) { - warning('polygon_feat_types is deprecated.\n Use z_indices instead') - z_indices = polygon_feat_types + 1L - } - - segm_to_use = paste0('p_', (segm_to_use - 1L)) - - # data.table vars - x = y = z = NULL - - # provide your own custom names - if(!is.null(custom_polygon_names)) { - - if(!is.character(custom_polygon_names)) { - stop(wrap_txt('If custom_polygon_names are provided, it needs to be a character vector')) - } - - if(length(custom_polygon_names) != length(z_indices)) { - stop(wrap_txt('length of custom names need to be same as z_indices')) +readPolygonFilesVizgenHDF5 <- function(boundaries_path, + fovs = NULL, + z_indices = 1L:7L, + segm_to_use = 1L, + custom_polygon_names = NULL, + flip_x_axis = FALSE, + flip_y_axis = TRUE, + calc_centroids = FALSE, + smooth_polygons = TRUE, + smooth_vertices = 60L, + set_neg_to_zero = FALSE, + H5Fopen_flags = "H5F_ACC_RDWR", + cores = determine_cores(), + create_gpoly_parallel = TRUE, + create_gpoly_bin = FALSE, + verbose = TRUE, + output = c("giottoPolygon", "data.table"), + polygon_feat_types = NULL) { + # necessary pkgs + package_check(pkg_name = "rhdf5", repository = "Bioc") + + output <- match.arg(output, choices = c("giottoPolygon", "data.table")) + + # deprecation + if (!is.null(polygon_feat_types)) { + warning("polygon_feat_types is deprecated.\n Use z_indices instead") + z_indices <- polygon_feat_types + 1L } - } - - if(isTRUE(verbose)) wrap_msg('Reading from:', boundaries_path) - # list all files in the folder - hdf5_boundary_list = list.files(full.names = TRUE, boundaries_path) - # only load subset of files if fov is given - if(!is.null(fovs)) { - - selected_hdf5s = paste0('feature_data_', fovs, '.hdf5') - selected_hdf5s_concatenated = paste0(selected_hdf5s, collapse = '|') - hdf5_boundary_selected_list = grep(selected_hdf5s_concatenated, x = hdf5_boundary_list, value = TRUE) - - } else { - hdf5_boundary_selected_list = hdf5_boundary_list - } - - if(isTRUE(verbose)) wrap_msg('finished listing .hdf5 files - start extracting .hdf5 information') - - # open selected polygon files - # hdf5_list_length = length(hdf5_boundary_selected_list) - - # append data from all FOVs to single list - init = Sys.time() - progressr::with_progress({ - pb = progressr::progressor(length(hdf5_boundary_selected_list)/5) - read_list = lapply_flex(seq_along(hdf5_boundary_selected_list), - future.packages = c('rhdf5', 'Rhdf5lib'), - function(init, z_indices, segm_to_use, bound_i) { - read_file = .h5_read_vizgen(h5File = hdf5_boundary_selected_list[[bound_i]][[1]], - z_indices = z_indices, - segm_to_use = segm_to_use, - H5Fopen_flags = H5Fopen_flags) - - # update progress - if(verbose) print(basename(hdf5_boundary_selected_list[[bound_i]])) - if(bound_i %% 5 == 0) { - pb() - } - - return(read_file) - }, - cores = cores, - init = init, - z_indices = z_indices, - segm_to_use = segm_to_use) - }) - - # combine to FOV data single list - read_DT = data.table::rbindlist(read_list) - - # perform any necessary flips - if(flip_x_axis) read_DT[, x := -1 * x] - if(flip_y_axis) read_DT[, y := -1 * y] - - # separate polygons by z index - zvals = read_DT[, unique(z)] - z_names = paste0('z', zvals) - z_read_DT = lapply(seq_along(zvals), function(z_idx) { - read_DT[z == zvals[z_idx],] - }) - names(z_read_DT) = z_names - if(!is.null(custom_polygon_names)) poly_names = custom_polygon_names - else poly_names = z_names - - if(isTRUE(verbose)) wrap_msg('finished extracting .hdf5 files') - - # outputs - switch( - output, - "giottoPolygon" = .create_giotto_polygons_vizgen( - z_read_DT = z_read_DT, - poly_names = poly_names, - set_neg_to_zero = set_neg_to_zero, - calc_centroids = calc_centroids, - smooth_polygons = smooth_polygons, - smooth_vertices = smooth_vertices, - create_gpoly_parallel = create_gpoly_parallel, - create_gpoly_bin = create_gpoly_bin, - verbose = verbose - ), - "data.table" = z_read_DT - ) -} + segm_to_use <- paste0("p_", (segm_to_use - 1L)) + # data.table vars + x <- y <- z <- NULL - -#' @keywords internal -#' @noRd -.create_giotto_polygons_vizgen = function(z_read_DT, - poly_names = names(z_read_DT), - set_neg_to_zero = FALSE, - calc_centroids = FALSE, - smooth_polygons = TRUE, - smooth_vertices = 60L, - create_gpoly_parallel = TRUE, - create_gpoly_bin = FALSE, - verbose = TRUE) { - checkmate::assert_list(z_read_DT) - checkmate::assert_numeric(smooth_vertices) - - # data.table vars - x = y = cell_id = poly_ID = NULL - - if(isTRUE(verbose)) wrap_msg('start creating polygons') - - # **** sequential method **** - if(!isTRUE(create_gpoly_parallel)) { - progressr::with_progress({ - pb = progressr::progressor(along = z_read_DT) - smooth_cell_polygons_list = lapply(seq_along(z_read_DT), function(i) { - dfr_subset = z_read_DT[[i]][,.(x, y, cell_id)] - data.table::setnames(dfr_subset, old = 'cell_id', new = 'poly_ID') - cell_polygons = createGiottoPolygonsFromDfr(segmdfr = dfr_subset, - name = poly_names[i], - calc_centroids = FALSE, - skip_eval_dfr = TRUE, - copy_dt = FALSE, - verbose = verbose) - if(isTRUE(smooth_polygons)) { - cell_polygons = smoothGiottoPolygons( - gpolygon = cell_polygons, - vertices = smooth_vertices, - k = 3L, - set_neg_to_zero = set_neg_to_zero - ) - } - if(isTRUE(calc_centroids)) { - # NOTE: will not recalculate if centroids are already attached - cell_polygons = centroids(cell_polygons, append_gpolygon = TRUE) + # provide your own custom names + if (!is.null(custom_polygon_names)) { + if (!is.character(custom_polygon_names)) { + stop(wrap_txt("If custom_polygon_names are provided, it needs to + be a character vector")) } - pb(message = c(poly_names[i], ' (', i, '/', length(z_read_DT), ')')) - return(cell_polygons) - }) - }) - return(smooth_cell_polygons_list) - } + if (length(custom_polygon_names) != length(z_indices)) { + stop(wrap_txt( + "length of custom names need to be same as z_indices")) + } + } - # **** parallel methods **** - # no binning - if(!is.numeric(create_gpoly_bin)) { + if (isTRUE(verbose)) wrap_msg("Reading from:", boundaries_path) + # list all files in the folder + hdf5_boundary_list <- list.files(full.names = TRUE, boundaries_path) + # only load subset of files if fov is given + if (!is.null(fovs)) { + selected_hdf5s <- paste0("feature_data_", fovs, ".hdf5") + selected_hdf5s_concatenated <- paste0(selected_hdf5s, collapse = "|") + hdf5_boundary_selected_list <- grep( + selected_hdf5s_concatenated, x = hdf5_boundary_list, value = TRUE) + } else { + hdf5_boundary_selected_list <- hdf5_boundary_list + } - progressr::with_progress({ - pb = progressr::progressor(along = z_read_DT) - smooth_cell_polygons_list = lapply_flex( - seq_along(z_read_DT), - future.packages = c('terra', 'stats', 'data.table'), - function(i) { - dfr_subset = z_read_DT[[i]][,.(x, y, cell_id)] - data.table::setnames(dfr_subset, old = 'cell_id', new = 'poly_ID') - cell_polygons = gpoly_from_dfr_smoothed_wrapped( - segmdfr = dfr_subset, - name = poly_names[i], - skip_eval_dfr = TRUE, - copy_dt = FALSE, - smooth_polygons = smooth_polygons, - vertices = smooth_vertices, - set_neg_to_zero = set_neg_to_zero, - calc_centroids = calc_centroids, - verbose = verbose - ) + if (isTRUE(verbose)) + message("finished listing .hdf5 files start extracting .hdf5 + information") - pb(message = c(poly_names[i], ' (', i, '/', length(z_read_DT), ')')) - return(cell_polygons) - } - ) - }) + # open selected polygon files - # unwrap results - smooth_cell_polygons_list = lapply(smooth_cell_polygons_list, function(x) { - slot(x, 'spatVector') = terra::vect(slot(x, 'spatVector')) - if(isTRUE(calc_centroids)) { - slot(x, 'spatVectorCentroids') = terra::vect(slot(x, 'spatVectorCentroids')) - } - return(x) + # append data from all FOVs to single list + init <- Sys.time() + progressr::with_progress({ + pb <- progressr::progressor(length(hdf5_boundary_selected_list) / 5) + read_list <- lapply_flex(seq_along(hdf5_boundary_selected_list), + future.packages = c("rhdf5", "Rhdf5lib"), + function(init, z_indices, segm_to_use, bound_i) { + read_file <- .h5_read_vizgen( + h5File = hdf5_boundary_selected_list[[bound_i]][[1]], + z_indices = z_indices, + segm_to_use = segm_to_use, + H5Fopen_flags = H5Fopen_flags + ) + + # update progress + if (verbose) + print(basename(hdf5_boundary_selected_list[[bound_i]])) + if (bound_i %% 5 == 0) { + pb() + } + + return(read_file) + }, + cores = cores, + init = init, + z_indices = z_indices, + segm_to_use = segm_to_use + ) }) - } else { - # with binning + # combine to FOV data single list + read_DT <- data.table::rbindlist(read_list) - dfr_subset = lapply(z_read_DT, function(bin, DT) { + # perform any necessary flips + if (flip_x_axis) read_DT[, x := -1 * x] + if (flip_y_axis) read_DT[, y := -1 * y] - DT = DT[,.(x, y, cell_id)] - data.table::setnames(DT, old = 'cell_id', new = 'poly_ID') - pid = DT[, unique(poly_ID)] + # separate polygons by z index + zvals <- read_DT[, unique(z)] + z_names <- paste0("z", zvals) + z_read_DT <- lapply(seq_along(zvals), function(z_idx) { + read_DT[z == zvals[z_idx], ] + }) + names(z_read_DT) <- z_names + if (!is.null(custom_polygon_names)) { + poly_names <- custom_polygon_names + } else { + poly_names <- z_names + } - bin_pid = data.table::data.table( - 'poly_ID' = pid, - 'bin_ID' = as.numeric( - cut(x = seq_along(pid), - breaks = ceiling(length(pid)/bin)) - ) - ) - DT = data.table::merge.data.table(DT, bin_pid, by = 'poly_ID', all.x = TRUE) - DT = split(DT, DT$bin_ID) + if (isTRUE(verbose)) message("finished extracting .hdf5 files") - }, bin = create_gpoly_bin) + # outputs + switch(output, + "giottoPolygon" = .create_giotto_polygons_vizgen( + z_read_DT = z_read_DT, + poly_names = poly_names, + set_neg_to_zero = set_neg_to_zero, + calc_centroids = calc_centroids, + smooth_polygons = smooth_polygons, + smooth_vertices = smooth_vertices, + create_gpoly_parallel = create_gpoly_parallel, + create_gpoly_bin = create_gpoly_bin, + verbose = verbose + ), + "data.table" = z_read_DT + ) +} - bin_steps = sum(unlist(lapply(dfr_subset, length))) - progressr::with_progress({ - pb = progressr::progressor(steps = bin_steps) - smooth_cell_polygons_list = lapply( # sequential across z index - seq_along(dfr_subset), - function(i) { - lapply_flex( # parallelize across bins - dfr_subset[[i]], - future.packages = c('terra', 'stats', 'data.table'), - function(bin_DT) { - cell_polygons = gpoly_from_dfr_smoothed_wrapped( - segmdfr = bin_DT, - name = poly_names[i], - skip_eval_dfr = TRUE, - copy_dt = FALSE, - smooth_polygons = smooth_polygons, - vertices = smooth_vertices, - set_neg_to_zero = set_neg_to_zero, - calc_centroids = calc_centroids, - verbose = verbose - ) - pb(message = c(poly_names[i], ' (', i, '/', length(dfr_subset), ')')) - return(cell_polygons) - } - ) - } - ) - }) - # unwrap results - smooth_cell_polygons_list = lapply(seq_along(smooth_cell_polygons_list), function(i) { - p_list = lapply(smooth_cell_polygons_list[[i]], function(x) { - slot(x, 'spatVector') = terra::vect(slot(x, 'spatVector')) - if(isTRUE(calc_centroids)) { - slot(x, 'spatVectorCentroids') = terra::vect(slot(x, 'spatVectorCentroids')) - } - return(x) - }) - # rbind results - names(p_list) = NULL - return(do.call('rbind', p_list)) - }) +#' @keywords internal +#' @noRd +.create_giotto_polygons_vizgen <- function(z_read_DT, + poly_names = names(z_read_DT), + set_neg_to_zero = FALSE, + calc_centroids = FALSE, + smooth_polygons = TRUE, + smooth_vertices = 60L, + create_gpoly_parallel = TRUE, + create_gpoly_bin = FALSE, + verbose = TRUE) { + checkmate::assert_list(z_read_DT) + checkmate::assert_numeric(smooth_vertices) + + # data.table vars + x <- y <- cell_id <- poly_ID <- NULL + + if (isTRUE(verbose)) message("start creating polygons") + + # **** sequential method **** + if (!isTRUE(create_gpoly_parallel)) { + progressr::with_progress({ + pb <- progressr::progressor(along = z_read_DT) + smooth_cell_polygons_list <- lapply( + seq_along(z_read_DT), function(i) { + dfr_subset <- z_read_DT[[i]][, .(x, y, cell_id)] + data.table::setnames( + dfr_subset, old = "cell_id", new = "poly_ID") + cell_polygons <- createGiottoPolygonsFromDfr( + segmdfr = dfr_subset, + name = poly_names[i], + calc_centroids = FALSE, + skip_eval_dfr = TRUE, + copy_dt = FALSE, + verbose = verbose + ) + if (isTRUE(smooth_polygons)) { + cell_polygons <- smoothGiottoPolygons( + gpolygon = cell_polygons, + vertices = smooth_vertices, + k = 3L, + set_neg_to_zero = set_neg_to_zero + ) + } + if (isTRUE(calc_centroids)) { + # NOTE: won't recalculate if centroids are already attached + cell_polygons <- centroids( + cell_polygons, append_gpolygon = TRUE) + } + pb(message = c( + poly_names[i], " (", i, "/", length(z_read_DT), ")")) + return(cell_polygons) + }) + }) + return(smooth_cell_polygons_list) + } - } + # **** parallel methods **** + # no binning + if (!is.numeric(create_gpoly_bin)) { + progressr::with_progress({ + pb <- progressr::progressor(along = z_read_DT) + smooth_cell_polygons_list <- lapply_flex( + seq_along(z_read_DT), + future.packages = c("terra", "stats", "data.table"), + function(i) { + dfr_subset <- z_read_DT[[i]][, .(x, y, cell_id)] + data.table::setnames( + dfr_subset, old = "cell_id", new = "poly_ID") + cell_polygons <- gpoly_from_dfr_smoothed_wrapped( + segmdfr = dfr_subset, + name = poly_names[i], + skip_eval_dfr = TRUE, + copy_dt = FALSE, + smooth_polygons = smooth_polygons, + vertices = smooth_vertices, + set_neg_to_zero = set_neg_to_zero, + calc_centroids = calc_centroids, + verbose = verbose + ) + + pb(message = c( + poly_names[i], " (", i, "/", length(z_read_DT), ")")) + return(cell_polygons) + } + ) + }) + + # unwrap results + smooth_cell_polygons_list <- lapply( + smooth_cell_polygons_list, function(x) { + slot(x, "spatVector") <- terra::vect(slot(x, "spatVector")) + if (isTRUE(calc_centroids)) { + slot(x, "spatVectorCentroids") <- terra::vect( + slot(x, "spatVectorCentroids")) + } + return(x) + }) + } else { + # with binning + + dfr_subset <- lapply(z_read_DT, function(bin, DT) { + DT <- DT[, .(x, y, cell_id)] + data.table::setnames(DT, old = "cell_id", new = "poly_ID") + pid <- DT[, unique(poly_ID)] + + bin_pid <- data.table::data.table( + "poly_ID" = pid, + "bin_ID" = as.numeric( + cut( + x = seq_along(pid), + breaks = ceiling(length(pid) / bin) + ) + ) + ) + DT <- data.table::merge.data.table( + DT, bin_pid, by = "poly_ID", all.x = TRUE) + DT <- split(DT, DT$bin_ID) + }, bin = create_gpoly_bin) + + bin_steps <- sum(unlist(lapply(dfr_subset, length))) + + progressr::with_progress({ + pb <- progressr::progressor(steps = bin_steps) + smooth_cell_polygons_list <- lapply( # sequential across z index + seq_along(dfr_subset), + function(i) { + lapply_flex( # parallelize across bins + dfr_subset[[i]], + future.packages = c("terra", "stats", "data.table"), + function(bin_DT) { + cell_polygons <- gpoly_from_dfr_smoothed_wrapped( + segmdfr = bin_DT, + name = poly_names[i], + skip_eval_dfr = TRUE, + copy_dt = FALSE, + smooth_polygons = smooth_polygons, + vertices = smooth_vertices, + set_neg_to_zero = set_neg_to_zero, + calc_centroids = calc_centroids, + verbose = verbose + ) + + pb(message = c( + poly_names[i], " (", i, "/", + length(dfr_subset), ")")) + return(cell_polygons) + } + ) + } + ) + }) + + # unwrap results + smooth_cell_polygons_list <- lapply( + seq_along(smooth_cell_polygons_list), function(i) { + p_list <- lapply(smooth_cell_polygons_list[[i]], function(x) { + slot(x, "spatVector") <- terra::vect(slot(x, "spatVector")) + if (isTRUE(calc_centroids)) { + slot(x, "spatVectorCentroids") <- terra::vect( + slot(x, "spatVectorCentroids")) + } + return(x) + }) + # rbind results + names(p_list) <- NULL + return(do.call("rbind", p_list)) + }) + } - return(smooth_cell_polygons_list) + return(smooth_cell_polygons_list) } @@ -1145,96 +1183,98 @@ readPolygonFilesVizgenHDF5 = function(boundaries_path, #' @title Read MERSCOPE polygons from parquet #' @name readPolygonVizgenParquet #' @description -#' Read Vizgen exported cell boundary parquet files as giottoPolyons. The z level -#' can be selected. +#' Read Vizgen exported cell boundary parquet files as giottoPolyons. The z +#' level can be selected. #' @param file parquet file to load -#' @param z_index either 'all' or a numeric vector of z_indices to get polygons for +#' @param z_index either 'all' or a numeric vector of z_indices to get polygons +#' for #' @param calc_centroids calculate centroids for the polygons (default = TRUE) #' @param verbose be verbose #' @export -readPolygonVizgenParquet = function( - file, - z_index = 'all', - calc_centroids = TRUE, - verbose = TRUE -) { - - # package checks - package_check('arrow') - package_check('sf') - package_check('dplyr') - - - checkmate::assert_file_exists(file) - if(!setequal(z_index, 'all')) { - checkmate::assert_numeric(z_index) - } else { - checkmate::assert_true(identical(z_index, 'all')) - } - - # NSE vars - ZIndex = Geometry = NULL - - # 1. determine z indices to get - avail_z_idx = arrow::open_dataset(file) %>% - dplyr::distinct(ZIndex) %>% - dplyr::pull() %>% - # dplyr::pull(as_vector = TRUE) %>% # switch to this in future and add arrow version requirement - sort() - - get_z_idx = if(setequal(z_index, 'all')) { - avail_z_idx - } else if(is.numeric(z_index)) { - z_index = as.integer(z_index) - if(!all(z_index %in% avail_z_idx)) { - stop(paste('Not all z indices found in cell boundaries.\n Existing indices are:', paste(avail_z_idx, collapse = ' '))) +readPolygonVizgenParquet <- function( + file, + z_index = "all", + calc_centroids = TRUE, + verbose = TRUE) { + # package checks + package_check("arrow") + package_check("sf") + package_check("dplyr") + + + checkmate::assert_file_exists(file) + if (!setequal(z_index, "all")) { + checkmate::assert_numeric(z_index) + } else { + checkmate::assert_true(identical(z_index, "all")) } - z_index - } - if(isTRUE(verbose)) message('loading poly z_indices: ', paste(get_z_idx, collapse = ' ')) - - - # 2. collect by z index filter and convert WKB to multipolygon - multipolygons = lapply_flex( - get_z_idx, - function(z_idx) { - # set schema - schema = arrow::open_dataset(file)$schema - schema$EntityID = arrow::string() - - # read and convert - arrow::open_dataset(file, schema = schema) %>% - dplyr::filter(ZIndex == z_idx) %>% - dplyr::collect() %>% - dplyr::mutate(Geometry = sf::st_as_sfc(Geometry)) - }, - future.seed = TRUE - ) - names(multipolygons) = lapply(multipolygons, function(x) paste0('z', unique(x$ZIndex))) - - - # 3. convert to giottoPolygons and append meta - out = lapply(seq_along(multipolygons), function(i) { - # append poly IDs and meta - poly_table = multipolygons[[i]] - sv = terra::vect(poly_table$Geometry) - sv$poly_ID = poly_table$EntityID - sv$z_level = poly_table$ZLevel - - gpoly = giottoPolygon( - name = names(multipolygons)[[i]], - spatVector = sv, - unique_ID_cache = poly_table$EntityID - ) - if(isTRUE(calc_centroids)) { - # NOTE: will not recalculate if centroids are already attached - gpoly = GiottoClass::centroids(x = gpoly, append_gpolygon = TRUE) + # NSE vars + ZIndex <- Geometry <- NULL + + # 1. determine z indices to get + avail_z_idx <- arrow::open_dataset(file) %>% + dplyr::distinct(ZIndex) %>% + dplyr::pull() %>% + # dplyr::pull(as_vector = TRUE) %>% # switch to this in future and add + # arrow version requirement + sort() + + get_z_idx <- if (setequal(z_index, "all")) { + avail_z_idx + } else if (is.numeric(z_index)) { + z_index <- as.integer(z_index) + if (!all(z_index %in% avail_z_idx)) { + stop(paste("Not all z indices found in cell boundaries.\n + Existing indices are:", paste(avail_z_idx, collapse = " "))) + } + z_index } + if (isTRUE(verbose)) + message("loading poly z_indices: ", paste(get_z_idx, collapse = " ")) + + + # 2. collect by z index filter and convert WKB to multipolygon + multipolygons <- lapply_flex( + get_z_idx, + function(z_idx) { + # set schema + schema <- arrow::open_dataset(file)$schema + schema$EntityID <- arrow::string() + + # read and convert + arrow::open_dataset(file, schema = schema) %>% + dplyr::filter(ZIndex == z_idx) %>% + dplyr::collect() %>% + dplyr::mutate(Geometry = sf::st_as_sfc(Geometry)) + }, + future.seed = TRUE + ) + names(multipolygons) <- lapply( + multipolygons, function(x) paste0("z", unique(x$ZIndex))) + + + # 3. convert to giottoPolygons and append meta + out <- lapply(seq_along(multipolygons), function(i) { + # append poly IDs and meta + poly_table <- multipolygons[[i]] + sv <- terra::vect(poly_table$Geometry) + sv$poly_ID <- poly_table$EntityID + sv$z_level <- poly_table$ZLevel + + gpoly <- giottoPolygon( + name = names(multipolygons)[[i]], + spatVector = sv, + unique_ID_cache = poly_table$EntityID + ) - }) + if (isTRUE(calc_centroids)) { + # NOTE: will not recalculate if centroids are already attached + gpoly <- GiottoClass::centroids(x = gpoly, append_gpolygon = TRUE) + } + }) - return(out) + return(out) } @@ -1262,117 +1302,120 @@ readPolygonVizgenParquet = function( #' @param verbose be verbose #' @seealso \code{\link{smoothGiottoPolygons}} #' @export -readPolygonFilesVizgen = function(gobject, - boundaries_path, - fovs = NULL, - polygon_feat_types = 0:6, - flip_x_axis = F, - flip_y_axis = F, - smooth_polygons = TRUE, - smooth_vertices = 60, - set_neg_to_zero = FALSE, - return_gobject = TRUE, - verbose = TRUE) { - - # define names - poly_feat_names = paste0('z', polygon_feat_types) - poly_feat_indexes = paste0('zIndex_', polygon_feat_types) - - # select FOVs present in the subset - if(is.null(fovs)) { - subset_metadata = pDataDT(gobject) - fovs = unique(subset_metadata$fov) - } - - - - smooth_cell_polygons_list = readPolygonFilesVizgenHDF5(boundaries_path = boundaries_path, - fovs = fovs, - polygon_feat_types = polygon_feat_types, - flip_x_axis = flip_x_axis, - flip_y_axis = flip_y_axis, - smooth_polygons = smooth_polygons, - smooth_vertices = smooth_vertices, - set_neg_to_zero = set_neg_to_zero, - verbose = verbose) - - - if(return_gobject) { - # add cell polygons to Giotto object - names(smooth_cell_polygons_list) = poly_feat_names - gobject = addGiottoPolygons(gobject = gobject, - gpolygons = smooth_cell_polygons_list) - return(gobject) - } else { - return(smooth_cell_polygons_list) - } +readPolygonFilesVizgen <- function(gobject, + boundaries_path, + fovs = NULL, + polygon_feat_types = 0:6, + flip_x_axis = FALSE, + flip_y_axis = FALSE, + smooth_polygons = TRUE, + smooth_vertices = 60, + set_neg_to_zero = FALSE, + return_gobject = TRUE, + verbose = TRUE) { + # define names + poly_feat_names <- paste0("z", polygon_feat_types) + poly_feat_indexes <- paste0("zIndex_", polygon_feat_types) + + # select FOVs present in the subset + if (is.null(fovs)) { + subset_metadata <- pDataDT(gobject) + fovs <- unique(subset_metadata$fov) + } + + + + smooth_cell_polygons_list <- readPolygonFilesVizgenHDF5( + boundaries_path = boundaries_path, + fovs = fovs, + polygon_feat_types = polygon_feat_types, + flip_x_axis = flip_x_axis, + flip_y_axis = flip_y_axis, + smooth_polygons = smooth_polygons, + smooth_vertices = smooth_vertices, + set_neg_to_zero = set_neg_to_zero, + verbose = verbose + ) + if (return_gobject) { + # add cell polygons to Giotto object + names(smooth_cell_polygons_list) <- poly_feat_names + gobject <- addGiottoPolygons( + gobject = gobject, + gpolygons = smooth_cell_polygons_list + ) + return(gobject) + } else { + return(smooth_cell_polygons_list) + } } -#' @describeIn readPolygonFilesVizgen (internal) Optimized .hdf5 reading for vizgen -#' merscope output. Returns a data.table of xyz coords and cell_id +#' @describeIn readPolygonFilesVizgen (internal) Optimized .hdf5 reading for +#' vizgen merscope output. Returns a data.table of xyz coords and cell_id #' @keywords internal -.h5_read_vizgen = function(h5File, - z_indices = 1L:7L, - segm_to_use = 'p_0', - H5Fopen_flags = "H5F_ACC_RDWR") { - - # data.table vars - group = name = cell = z_name = otype = d_name = cell_id = NULL - - h5_ls = data.table::setDT(rhdf5::h5ls(h5File, recursive = 5, datasetinfo = FALSE)) - cell_names = as.character(h5_ls[group == '/featuredata', name]) - z_names = h5_ls[grep('zIndex', name), unique(name)] - - dset_names = h5_ls[otype == 'H5I_DATASET' & name == 'coordinates',] - # subset by segm_to_use - dset_names = dset_names[grep(segm_to_use, group),] - # tag cellnames - dset_names[, cell := gsub(pattern = '/featuredata/|/zIndex.*$', replacement = '', x = group)] - # tag z_names - dset_names[, z_name := gsub(pattern = '^.*/(zIndex_\\d*).*$', replacement = '\\1', x = group)] - # subset by z_indices - dset_names = dset_names[z_name %in% z_names[z_indices],] - # create full file location - dset_names[, d_name := paste0(group, '/', name)] - - fid = rhdf5::H5Fopen(h5File, flags = H5Fopen_flags) - dapl = rhdf5::H5Pcreate('H5P_DATASET_ACCESS') - - contents = lapply(cell_names, function(fid, dapl, cell_name) { - - zvals = .h5_read_bare(file = fid, - name = paste0(c('/featuredata', cell_name, 'z_coordinates'), collapse = '/'), - dapl = dapl) - names(zvals) = z_names - - # subset to datasets related to cell - cell_dsets = dset_names[cell == cell_name,] - - cell_data = lapply(seq(nrow(cell_dsets)), function(fid, dapl, zvals, d_i) { - - res = .h5_read_bare(file = fid, name = cell_dsets[d_i, d_name], dapl = dapl) - res = t_flex(res[,,1L]) - res = cbind(res, zvals[cell_dsets[d_i, z_name]]) - colnames(res) = c('x', 'y', 'z') - res - - }, fid = fid, dapl = dapl, zvals = zvals) - cell_data = data.table::as.data.table(do.call('rbind', cell_data)) - cell_data[, cell_id := cell_name] - cell_data - - }, fid = fid, dapl = dapl) - - rhdf5::H5Pclose(dapl) - rhdf5::H5Fclose(fid) - contents = data.table::rbindlist(contents) - contents - +.h5_read_vizgen <- function(h5File, + z_indices = 1L:7L, + segm_to_use = "p_0", + H5Fopen_flags = "H5F_ACC_RDWR") { + # data.table vars + group <- name <- cell <- z_name <- otype <- d_name <- cell_id <- NULL + + h5_ls <- data.table::setDT( + rhdf5::h5ls(h5File, recursive = 5, datasetinfo = FALSE)) + cell_names <- as.character(h5_ls[group == "/featuredata", name]) + z_names <- h5_ls[grep("zIndex", name), unique(name)] + + dset_names <- h5_ls[otype == "H5I_DATASET" & name == "coordinates", ] + # subset by segm_to_use + dset_names <- dset_names[grep(segm_to_use, group), ] + # tag cellnames + dset_names[, cell := gsub( + pattern = "/featuredata/|/zIndex.*$", replacement = "", x = group)] + # tag z_names + dset_names[, z_name := gsub( + pattern = "^.*/(zIndex_\\d*).*$", replacement = "\\1", x = group)] + # subset by z_indices + dset_names <- dset_names[z_name %in% z_names[z_indices], ] + # create full file location + dset_names[, d_name := paste0(group, "/", name)] + + fid <- rhdf5::H5Fopen(h5File, flags = H5Fopen_flags) + dapl <- rhdf5::H5Pcreate("H5P_DATASET_ACCESS") + + contents <- lapply(cell_names, function(fid, dapl, cell_name) { + zvals <- .h5_read_bare( + file = fid, + name = paste0( + c("/featuredata", cell_name, "z_coordinates"), collapse = "/"), + dapl = dapl + ) + names(zvals) <- z_names + + # subset to datasets related to cell + cell_dsets <- dset_names[cell == cell_name, ] + + cell_data <- lapply( + seq(nrow(cell_dsets)), function(fid, dapl, zvals, d_i) { + res <- .h5_read_bare( + file = fid, name = cell_dsets[d_i, d_name], dapl = dapl) + res <- t_flex(res[, , 1L]) + res <- cbind(res, zvals[cell_dsets[d_i, z_name]]) + colnames(res) <- c("x", "y", "z") + res + }, fid = fid, dapl = dapl, zvals = zvals) + cell_data <- data.table::as.data.table(do.call("rbind", cell_data)) + cell_data[, cell_id := cell_name] + cell_data + }, fid = fid, dapl = dapl) + + rhdf5::H5Pclose(dapl) + rhdf5::H5Fclose(fid) + contents <- data.table::rbindlist(contents) + contents } @@ -1383,18 +1426,14 @@ readPolygonFilesVizgen = function(gobject, #' @param name dataset name within #' @param dapl HDF5 property list (H5Pcreate('H5P_DATASET_ACCESS')) #' @keywords internal -.h5_read_bare = function(file, name = "", dapl) { - did = .Call("_H5Dopen", file@ID, name, dapl@ID, PACKAGE = "rhdf5") - res = .Call("_H5Dread", did, NULL, NULL, NULL, TRUE, 0L, FALSE, FALSE, - PACKAGE = "rhdf5") - invisible(.Call("_H5Dclose", did, PACKAGE = "rhdf5")) - # C_H5Dopen = C_H5Dread = C_H5Dclose = NULL - # - # did = .Call(C_H5Dopen, file@ID, name, dapl@ID) - # res = .Call(C_H5Dread, did, NULL, NULL, NULL, TRUE, 0L, FALSE, FALSE) - # invisible(.Call(C_H5Dclose, did)) - # - res +.h5_read_bare <- function(file, name = "", dapl) { + did <- .Call("_H5Dopen", file@ID, name, dapl@ID, PACKAGE = "rhdf5") + res <- .Call("_H5Dread", did, NULL, NULL, NULL, TRUE, 0L, FALSE, FALSE, + PACKAGE = "rhdf5" + ) + invisible(.Call("_H5Dclose", did, PACKAGE = "rhdf5")) + + res } @@ -1412,37 +1451,34 @@ readPolygonFilesVizgen = function(gobject, #' @param gef_file path to .gef file #' @param bin_size bin size to select from .gef file #' @export -getGEFtxCoords = function(gef_file, - bin_size = 'bin100') { - - # data.table vars - genes = NULL - - # package check - package_check(pkg_name = 'rhdf5', repository = 'Bioc') - if(!file.exists(gef_file)) stop('File path to .gef file does not exist') - - # Step 1: Parse tx coords - exprDT = rhdf5::h5read(file = gef_file, - name = paste0('geneExp/', bin_size, '/expression')) - setDT(exprDT) - - # Step 2: Parse gene expression info using index - geneDT = rhdf5::h5read(file = gef_file, - name = paste0('geneExp/', bin_size, '/gene')) - setDT(geneDT) +getGEFtxCoords <- function(gef_file, + bin_size = "bin100") { + # data.table vars + genes <- NULL + + # package check + package_check(pkg_name = "rhdf5", repository = "Bioc") + if (!file.exists(gef_file)) stop("File path to .gef file does not exist") + + # Step 1: Parse tx coords + exprDT <- rhdf5::h5read( + file = gef_file, + name = paste0("geneExp/", bin_size, "/expression") + ) + setDT(exprDT) - # Step 3: Combine read expression and gene data by repeating count (match offset index) - # See STOMICS file format manual for more information about exprDT and geneDT - exprDT[, genes := rep(x = geneDT$gene, geneDT$count)] + # Step 2: Parse gene expression info using index + geneDT <- rhdf5::h5read( + file = gef_file, + name = paste0("geneExp/", bin_size, "/gene") + ) + setDT(geneDT) - return(exprDT) + # Step 3: Combine read expression and gene data by repeating count + # (match offset index) + # See STOMICS file format manual for more information about exprDT and + # geneDT + exprDT[, genes := rep(x = geneDT$gene, geneDT$count)] + return(exprDT) } - - - - - - - diff --git a/R/giotto_viewer.R b/R/giotto_viewer.R index 3c290327f..f20918e58 100644 --- a/R/giotto_viewer.R +++ b/R/giotto_viewer.R @@ -1,52 +1,48 @@ - - - - #' @title write_giotto_viewer_annotation -#' @description write out factor-like annotation data from a giotto object for the Viewer +#' @description write out factor-like annotation data from a giotto object for +#' the Viewer #' @param annotation annotation from the data.table from giotto object #' @param annot_name name of the annotation #' @param output_directory directory where to save the files #' @return write a .txt and .annot file for the selection annotation #' @keywords internal -write_giotto_viewer_annotation = function(annotation, - annot_name = 'test', - output_directory = getwd()) { - - if(is.numeric(annotation) == TRUE) { - - # annotation information and mapping - sorted_unique_numbers = sort(unique(annotation)) - annot_map = data.table::data.table(num = sorted_unique_numbers, fac = sorted_unique_numbers) - annot_information = annotation - - } else { - - # factors to numerics - uniq_factors = unique(annotation) - uniq_numerics = seq_along(uniq_factors) - - # create converter - uniq_factor_num_converter = uniq_numerics - names(uniq_factor_num_converter) = uniq_factors - - # annotation information and mapping - annot_map = data.table::data.table(num = uniq_numerics, fac = uniq_factors) - annot_information = uniq_factor_num_converter[annotation] +write_giotto_viewer_annotation <- function(annotation, + annot_name = "test", + output_directory = getwd()) { + if (is.numeric(annotation) == TRUE) { + # annotation information and mapping + sorted_unique_numbers <- sort(unique(annotation)) + annot_map <- data.table::data.table( + num = sorted_unique_numbers, fac = sorted_unique_numbers) + annot_information <- annotation + } else { + # factors to numerics + uniq_factors <- unique(annotation) + uniq_numerics <- seq_along(uniq_factors) - } + # create converter + uniq_factor_num_converter <- uniq_numerics + names(uniq_factor_num_converter) <- uniq_factors + # annotation information and mapping + annot_map <- data.table::data.table(num = uniq_numerics, fac = uniq_factors) + annot_information <- uniq_factor_num_converter[annotation] + } - # write to output directory - annot_inf_name = paste0(annot_name,'_annot_information','.txt') - write.table(annot_information, file = paste0(output_directory,'/', annot_inf_name), - quote = F, row.names = F, col.names = F, sep = ' ') - annot_inf_map = paste0(annot_name,'_annot_information','.annot') - write.table(annot_map,file = paste0(output_directory,'/', annot_inf_map), - quote = F, row.names = F, col.names = F, sep = '\t') + # write to output directory + annot_inf_name <- paste0(annot_name, "_annot_information", ".txt") + write.table(annot_information, + file = paste0(output_directory, "/", annot_inf_name), + quote = F, row.names = F, col.names = F, sep = " " + ) + annot_inf_map <- paste0(annot_name, "_annot_information", ".annot") + write.table(annot_map, + file = paste0(output_directory, "/", annot_inf_map), + quote = F, row.names = F, col.names = F, sep = "\t" + ) } @@ -58,15 +54,15 @@ write_giotto_viewer_annotation = function(annotation, #' @param output_directory directory where to save the files #' @return write a .txt and .annot file for the selection annotation #' @keywords internal -write_giotto_viewer_numeric_annotation = function(annotation, - annot_name = 'test', - output_directory = getwd()) { - - # write to output directory - annot_inf_map = paste0(annot_name,'_num_annot_information','.txt') - write.table(annotation,file = paste0(output_directory,'/', annot_inf_map), - quote = F, row.names = F, col.names = F, sep = '\t') - +write_giotto_viewer_numeric_annotation <- function(annotation, + annot_name = "test", + output_directory = getwd()) { + # write to output directory + annot_inf_map <- paste0(annot_name, "_num_annot_information", ".txt") + write.table(annotation, + file = paste0(output_directory, "/", annot_inf_map), + quote = F, row.names = F, col.names = F, sep = "\t" + ) } @@ -83,35 +79,33 @@ write_giotto_viewer_numeric_annotation = function(annotation, #' @param output_directory directory where to save the files #' @return write a .txt and .annot file for the selection annotation #' @keywords internal -write_giotto_viewer_dim_reduction = function(dim_reduction_cell, - dim_red = NULL, - dim_red_name = NULL, - dim_red_rounding = NULL, - dim_red_rescale = c(-20,20), - output_directory = getwd()) { - - - dim_red_coord = dim_reduction_cell[[dim_red]][[dim_red_name]]$coordinates[,1:2] - - if(is.null(dim_red_coord)) { - cat('\n combination of ', dim_red, ' and ', dim_red_name, ' does not exist \n') - } else { - - # round dimension reduction coordinates - if(!is.null(dim_red_rounding) & is.integer(dim_red_rounding)) { - dim_red_coord = round(dim_red_coord, digits = dim_red_rounding) - } - - # rescale dimension reduction coordinates - if(!is.null(dim_red_rescale) & length(dim_red_rescale) == 2) { - dim_red_coord = scales::rescale(x = dim_red_coord, to = dim_red_rescale) +write_giotto_viewer_dim_reduction <- function(dim_reduction_cell, + dim_red = NULL, + dim_red_name = NULL, + dim_red_rounding = NULL, + dim_red_rescale = c(-20, 20), + output_directory = getwd()) { + dim_red_coord <- dim_reduction_cell[[dim_red]][[dim_red_name]]$coordinates[, 1:2] + + if (is.null(dim_red_coord)) { + cat("\n combination of ", dim_red, " and ", dim_red_name, " does not exist \n") + } else { + # round dimension reduction coordinates + if (!is.null(dim_red_rounding) & is.integer(dim_red_rounding)) { + dim_red_coord <- round(dim_red_coord, digits = dim_red_rounding) + } + + # rescale dimension reduction coordinates + if (!is.null(dim_red_rescale) & length(dim_red_rescale) == 2) { + dim_red_coord <- scales::rescale(x = dim_red_coord, to = dim_red_rescale) + } + + dim_red_name <- paste0(dim_red, "_", dim_red_name, "_dim_coord.txt") + write.table(dim_red_coord, + file = paste0(output_directory, "/", dim_red_name), + quote = F, row.names = F, col.names = F, sep = " " + ) } - - dim_red_name = paste0(dim_red,'_',dim_red_name,'_dim_coord.txt') - write.table(dim_red_coord, file = paste0(output_directory,'/', dim_red_name), - quote = F, row.names = F, col.names = F, sep = ' ') - - } } @@ -142,254 +136,270 @@ write_giotto_viewer_dim_reduction = function(dim_reduction_cell, #' and add the gene signature names (.e.g cell types) to the numeric annotations parameter. #' @export #' @examples -#' #' \dontrun{ #' #' data(mini_giotto_single_cell) #' exportGiottoViewer(mini_giotto_single_cell) -#' #' } #' -exportGiottoViewer = function(gobject, - spat_unit = NULL, - feat_type = NULL, - spat_loc_name = 'raw', - output_directory = NULL, - spat_enr_names = NULL, - factor_annotations = NULL, - numeric_annotations = NULL, - dim_reductions, - dim_reduction_names, - expression_values = c('scaled', 'normalized', 'custom'), - dim_red_rounding = NULL, - dim_red_rescale = c(-20,20), - expression_rounding = 2, - overwrite_dir = TRUE, - verbose = T) { - - ## output directory ## - if(file.exists(output_directory)) { - if(overwrite_dir == TRUE) { - cat('\n output directory already exists, files will be overwritten \n') +exportGiottoViewer <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + spat_loc_name = "raw", + output_directory = NULL, + spat_enr_names = NULL, + factor_annotations = NULL, + numeric_annotations = NULL, + dim_reductions, + dim_reduction_names, + expression_values = c("scaled", "normalized", "custom"), + dim_red_rounding = NULL, + dim_red_rescale = c(-20, 20), + expression_rounding = 2, + overwrite_dir = TRUE, + verbose = T) { + ## output directory ## + if (file.exists(output_directory)) { + if (overwrite_dir == TRUE) { + cat("\n output directory already exists, files will be overwritten \n") + } else { + stop("\n output directory already exists, change overwrite_dir = TRUE to overwrite files \n") + } + } else if (is.null(output_directory)) { + cat("\n no output directory is provided, defaults to current directory: ", getwd(), "\n") + output_directory <- getwd() } else { - stop('\n output directory already exists, change overwrite_dir = TRUE to overwrite files \n') + cat("output directory is created \n") + dir.create(output_directory, recursive = T) } - } else if(is.null(output_directory)) { - cat('\n no output directory is provided, defaults to current directory: ', getwd(), '\n') - output_directory = getwd() - } else { - cat('output directory is created \n') - dir.create(output_directory, recursive = T) - } - - - # set feat type - if(is.null(feat_type)) { - feat_type = gobject@expression_feat[[1]] - } - - - if(verbose == TRUE) cat('\n write cell and gene IDs \n') - - ### output cell_IDs ### - giotto_cell_ids = gobject@cell_ID - write.table(giotto_cell_ids, file = paste0(output_directory,'/','giotto_cell_ids.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') - - ### output all feat_IDs ### - possible_feat_types = names(gobject@feat_ID) - feat_type = feat_type[feat_type %in% possible_feat_types] - - for(feat in feat_type) { - giotto_feat_ids = gobject@feat_ID[[feat]] - write.table(giotto_feat_ids, file = paste0(output_directory,'/','giotto_',feat,'_ids.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') - } - - - ### physical location ### - if(verbose == TRUE) cat('write physical centroid locations\n') - # data.table variables - sdimx = sdimy = NULL - spatial_location = get_spatial_locations(gobject = gobject, - spat_loc_name = spat_loc_name) - spatial_location = spatial_location[, .(sdimx, sdimy)] - write.table(spatial_location, file = paste0(output_directory,'/','centroid_locations.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') - ### offset file ### - offset_file = gobject@offset_file - if(!is.null(offset_file)) { - if(verbose == TRUE) cat('write offset file \n') - write.table(offset_file, file = paste0(output_directory,'/','offset_file.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') - } - - - - ### annotations ### - for(feat in feat_type) { - - - if(verbose == TRUE) cat('\n for feature type ', feat, ' do: ','\n') - - cell_metadata = combineMetadata(gobject = gobject, - feat_type = feat, - spat_enr_names = spat_enr_names) - - - # factor annotations # - if(!is.null(factor_annotations)) { - found_factor_annotations = factor_annotations[factor_annotations %in% colnames(cell_metadata)] - - for(sel_annot in found_factor_annotations) { - - if(verbose == TRUE) cat('\n write annotation data for: ', sel_annot,'\n') - - selected_annotation = cell_metadata[[sel_annot]] - write_giotto_viewer_annotation(annotation = selected_annotation, - annot_name = paste0(feat,'_', sel_annot), - output_directory = output_directory) - - } - - # annotiation list # - text_file_names = list() - annot_names = list() - for(sel_annot_id in seq_along(found_factor_annotations)) { - - sel_annot_name = found_factor_annotations[sel_annot_id] - annot_inf_name = paste0(sel_annot_name,'_annot_information.txt') - - annot_names[[sel_annot_id]] = sel_annot_name - text_file_names[[sel_annot_id]] = annot_inf_name - - } - - annot_list = data.table(txtfiles = unlist(text_file_names), names = unlist(annot_names)) - write.table(annot_list, file = paste0(output_directory,'/','annotation_list','_',feat, '.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') + # set feat type + if (is.null(feat_type)) { + feat_type <- gobject@expression_feat[[1]] } + if (verbose == TRUE) cat("\n write cell and gene IDs \n") - # numeric annotations # - if(!is.null(numeric_annotations)) { - found_numeric_annotations = numeric_annotations[numeric_annotations %in% colnames(cell_metadata)] - for(sel_annot in found_numeric_annotations) { + ### output cell_IDs ### + giotto_cell_ids <- gobject@cell_ID + write.table(giotto_cell_ids, + file = paste0(output_directory, "/", "giotto_cell_ids.txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) - if(verbose == TRUE) cat('\n write annotation data for: ', sel_annot,'\n') - selected_annotation = cell_metadata[[sel_annot]] - write_giotto_viewer_numeric_annotation(annotation = selected_annotation, - annot_name = paste0(feat,'_', sel_annot), - output_directory = output_directory) + ### output all feat_IDs ### + possible_feat_types <- names(gobject@feat_ID) + feat_type <- feat_type[feat_type %in% possible_feat_types] - } - - - - # numeric annotiation list # - text_file_names = list() - annot_names = list() - for(sel_annot_id in seq_along(found_numeric_annotations)) { - - sel_annot_name = found_numeric_annotations[sel_annot_id] - annot_inf_name = paste0(sel_annot_name,'_num_annot_information.txt') - - annot_names[[sel_annot_id]] = sel_annot_name - text_file_names[[sel_annot_id]] = annot_inf_name - - } - - annot_list = data.table(txtfiles = unlist(text_file_names), names = unlist(annot_names)) - write.table(annot_list, file = paste0(output_directory,'/','annotation_num_list','_',feat, '.txt'), - quote = F, row.names = F, col.names = F, sep = ' ') + for (feat in feat_type) { + giotto_feat_ids <- gobject@feat_ID[[feat]] + write.table(giotto_feat_ids, + file = paste0(output_directory, "/", "giotto_", feat, "_ids.txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) } - } + ### physical location ### + if (verbose == TRUE) message("write physical centroid locations") + + # data.table variables + sdimx <- sdimy <- NULL + spatial_location <- get_spatial_locations( + gobject = gobject, + spat_loc_name = spat_loc_name + ) + spatial_location <- spatial_location[, .(sdimx, sdimy)] + write.table(spatial_location, + file = paste0(output_directory, "/", "centroid_locations.txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) + + ### offset file ### + offset_file <- gobject@offset_file + if (!is.null(offset_file)) { + if (verbose == TRUE) message("write offset file") + write.table(offset_file, + file = paste0(output_directory, "/", "offset_file.txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) + } - ## end feat type loop + ### annotations ### + for (feat in feat_type) { + if (verbose == TRUE) cat("\n for feature type ", feat, " do: ", "\n") + + cell_metadata <- combineMetadata( + gobject = gobject, + feat_type = feat, + spat_enr_names = spat_enr_names + ) + + + # factor annotations # + if (!is.null(factor_annotations)) { + found_factor_annotations <- factor_annotations[factor_annotations %in% colnames(cell_metadata)] + + for (sel_annot in found_factor_annotations) { + if (verbose == TRUE) cat("\n write annotation data for: ", sel_annot, "\n") + + selected_annotation <- cell_metadata[[sel_annot]] + write_giotto_viewer_annotation( + annotation = selected_annotation, + annot_name = paste0(feat, "_", sel_annot), + output_directory = output_directory + ) + } + + # annotiation list # + text_file_names <- list() + annot_names <- list() + for (sel_annot_id in seq_along(found_factor_annotations)) { + sel_annot_name <- found_factor_annotations[sel_annot_id] + annot_inf_name <- paste0(sel_annot_name, "_annot_information.txt") + + annot_names[[sel_annot_id]] <- sel_annot_name + text_file_names[[sel_annot_id]] <- annot_inf_name + } + + annot_list <- data.table( + txtfiles = unlist(text_file_names), names = unlist(annot_names)) + write.table(annot_list, + file = paste0(output_directory, "/", + "annotation_list", "_", feat, ".txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) + } + + + + # numeric annotations # + if (!is.null(numeric_annotations)) { + found_numeric_annotations <- numeric_annotations[numeric_annotations %in% colnames(cell_metadata)] + for (sel_annot in found_numeric_annotations) { + if (verbose == TRUE) cat("\n write annotation data for: ", sel_annot, "\n") + selected_annotation <- cell_metadata[[sel_annot]] + write_giotto_viewer_numeric_annotation( + annotation = selected_annotation, + annot_name = paste0(feat, "_", sel_annot), + output_directory = output_directory + ) + } + + + + # numeric annotiation list # + text_file_names <- list() + annot_names <- list() + for (sel_annot_id in seq_along(found_numeric_annotations)) { + sel_annot_name <- found_numeric_annotations[sel_annot_id] + annot_inf_name <- paste0(sel_annot_name, "_num_annot_information.txt") + + annot_names[[sel_annot_id]] <- sel_annot_name + text_file_names[[sel_annot_id]] <- annot_inf_name + } + + annot_list <- data.table( + txtfiles = unlist(text_file_names), names = unlist(annot_names)) + write.table(annot_list, + file = paste0(output_directory, "/", + "annotation_num_list", "_", feat, ".txt"), + quote = FALSE, row.names = FALSE, col.names = FALSE, sep = " " + ) + } + } + ## end feat type loop - ### dimension reduction ### - dim_reduction_cell = gobject@dimension_reduction$cells - for(i in seq_along(dim_reduction_names)) { - temp_dim_red = dim_reductions[i] - temp_dim_red_name = dim_reduction_names[i] - if(verbose == TRUE) cat('write annotation data for: ', temp_dim_red, ' for ', temp_dim_red_name,'\n') + ### dimension reduction ### + dim_reduction_cell <- gobject@dimension_reduction$cells - write_giotto_viewer_dim_reduction(dim_reduction_cell = dim_reduction_cell, - dim_red = temp_dim_red, - dim_red_name = temp_dim_red_name, - dim_red_rounding = dim_red_rounding, - dim_red_rescale = dim_red_rescale, - output_directory = output_directory) - } + for (i in seq_along(dim_reduction_names)) { + temp_dim_red <- dim_reductions[i] + temp_dim_red_name <- dim_reduction_names[i] + if (verbose == TRUE) cat("write annotation data for: ", temp_dim_red, " for ", temp_dim_red_name, "\n") + write_giotto_viewer_dim_reduction( + dim_reduction_cell = dim_reduction_cell, + dim_red = temp_dim_red, + dim_red_name = temp_dim_red_name, + dim_red_rounding = dim_red_rounding, + dim_red_rescale = dim_red_rescale, + output_directory = output_directory + ) + } - ### expression data ### - # expression values to be used - if(verbose == TRUE) cat('\n write expression values \n') - values = match.arg(expression_values, unique(c( 'scaled', 'normalized', 'custom', expression_values))) - for(feat in feat_type) { - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat, - values = values) - expr_values = as.matrix(expr_values) - # swap cell_IDs for numerical values - colnames(expr_values) = 1:ncol(expr_values) - # round values - if(!is.null(expression_rounding)) { - expr_values = round(x = expr_values, digits = expression_rounding) - } - output_directory_norm = normalizePath(output_directory) - fileWrite_directory = paste0(output_directory_norm,'/',"giotto_expression.csv") - data.table::fwrite(data.table::as.data.table(expr_values, keep.rownames="gene"), file=fileWrite_directory, sep=",", quot=F, row.names=F, col.names=T) - - - if(verbose == TRUE) cat('\n finished writing giotto viewer files to', output_directory , '\n') - - if(verbose == TRUE){ - cat("\n") - cat("================================================================", "\n") - cat("Next steps. Please manually run the following in a SHELL terminal:", "\n") - cat("================================================================", "\n") - cat("cd ", output_directory, "\n") - cat("giotto_setup_image --require-stitch=n --image=n --image-multi-channel=n --segmentation=n --multi-fov=n --output-json=step1.json", "\n") - cat("smfish_step1_setup -c step1.json", "\n") - cat("giotto_setup_viewer --num-panel=2 --input-preprocess-json=step1.json --panel-1=PanelPhysicalSimple --panel-2=PanelTsne --output-json=step2.json --input-annotation-list=annotation_list.txt", "\n") - cat("smfish_read_config -c step2.json -o test.dec6.js -p test.dec6.html -q test.dec6.css", "\n") - cat("giotto_copy_js_css --output .", "\n") - cat("python3 -m http.server", "\n") - cat("================================================================", "\n") - cat("\n") - cat("Finally, open your browser, navigate to http://localhost:8000/. Then click on the file test.dec6.html to see the viewer.", "\n") - cat("\n") - cat("\n") - cat("For more information, http://spatialgiotto.rc.fas.harvard.edu/giotto.viewer.setup3.html", "\n") - cat("\n") + ### expression data ### + # expression values to be used + if (verbose == TRUE) cat("\n write expression values \n") + values <- match.arg(expression_values, unique(c("scaled", "normalized", "custom", expression_values))) + + for (feat in feat_type) { + expr_values <- get_expression_values( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat, + values = values + ) + expr_values <- as.matrix(expr_values) + + # swap cell_IDs for numerical values + colnames(expr_values) <- 1:ncol(expr_values) + # round values + if (!is.null(expression_rounding)) { + expr_values <- round(x = expr_values, digits = expression_rounding) + } + output_directory_norm <- normalizePath(output_directory) + fileWrite_directory <- paste0( + output_directory_norm, "/", "giotto_expression.csv") + data.table::fwrite( + data.table::as.data.table(expr_values, keep.rownames = "gene"), + file = fileWrite_directory, sep = ",", + quote = FALSE, row.names = FALSE, col.names = TRUE) + + + if (verbose == TRUE) + cat("finished writing giotto viewer files to", output_directory) + + if (verbose == TRUE) { + message("=========================================================") + message("Next steps. Please manually run the following in a SHELL + terminal:") + message("=========================================================") + message(paste("cd ", output_directory)) + message("giotto_setup_image --require-stitch=n --image=n + --image-multi-channel=n --segmentation=n --multi-fov=n + --output-json=step1.json") + message("smfish_step1_setup -c step1.json") + message("giotto_setup_viewer --num-panel=2 + --input-preprocess-json=step1.json + --panel-1=PanelPhysicalSimple --panel-2=PanelTsne + --output-json=step2.json + --input-annotation-list=annotation_list.txt") + message("smfish_read_config -c step2.json -o test.dec6.js + -p test.dec6.html -q test.dec6.css") + message("giotto_copy_js_css --output .") + message("python3 -m http.server") + message("=========================================================") + message("Finally, open your browser, navigate to + http://localhost:8000/. Then click on the file + test.dec6.html to see the viewer.") + message("For more information, http://spatialgiotto.rc.fas.harvard.edu/giotto.viewer.setup3.html", "\n") + } } - - } - } - - - - diff --git a/R/globals.R b/R/globals.R index 67257c5d3..53c08c870 100644 --- a/R/globals.R +++ b/R/globals.R @@ -1,7 +1,8 @@ -utils::globalVariables(names = c(":=", ".N", ".SD", ".", "cast", - "%--%", ".inc", # igraph - "python_leiden", "python_louvain", "python_spatial_genes", - "Spatial_DE_AEH", "Spatial_DE", "silhouette_rank", - "python_scrublet", "python_create_mesmer_app", - "python_segment_image")) - +utils::globalVariables(names = c( + ":=", ".N", ".SD", ".", "cast", + "%--%", ".inc", # igraph + "python_leiden", "python_louvain", "python_spatial_genes", + "Spatial_DE_AEH", "Spatial_DE", "silhouette_rank", + "python_scrublet", "python_create_mesmer_app", + "python_segment_image" +)) diff --git a/R/gstop.R b/R/gstop.R index 812bc3571..be2a805e1 100644 --- a/R/gstop.R +++ b/R/gstop.R @@ -3,22 +3,22 @@ # original call is desired. # .n should be increased to 2L when within a generic method .gstop <- function(..., - sep = " ", - strWidth = 100, - errWidth = FALSE, - .prefix = " ", - .initial = "", - .n = 1L, - .call = FALSE) { - GiottoUtils::gstop( - ..., - sep = sep, - strWidth = strWidth, - errWidth = errWidth, - .module = "Giotto", - .prefix = .prefix, - .initial = .initial, - .n = .n + 1L, - .call = .call - ) + sep = " ", + strWidth = 100, + errWidth = FALSE, + .prefix = " ", + .initial = "", + .n = 1L, + .call = FALSE) { + GiottoUtils::gstop( + ..., + sep = sep, + strWidth = strWidth, + errWidth = errWidth, + .module = "Giotto", + .prefix = .prefix, + .initial = .initial, + .n = .n + 1L, + .call = .call + ) } diff --git a/R/image_registration.R b/R/image_registration.R index 04a7dc28d..c86ea6ffc 100644 --- a/R/image_registration.R +++ b/R/image_registration.R @@ -1,4 +1,3 @@ - #' @include suite_reexports.R ### Image registration and creation of registered Giotto object #### @@ -6,93 +5,120 @@ #' @name .trakem2_rigid_transforms #' @title Read trakem2 rigid transforms -#' @description Extract rigid registration transformation values from FIJI TrakEM2 xml file. Generated through register_virtual_stack_slices. +#' @description Extract rigid registration transformation values from FIJI +#' TrakEM2 xml file. Generated through register_virtual_stack_slices. #' @param inputstring string read in from TrakeEM2 xml file #' @keywords internal -.trakem2_rigid_transforms = function(inputstring) { - - #Catch wrong inputs - if (grepl('^.*trakem2.*', inputstring, ignore.case = T) != 1) { - stop('xml files must be in TrakeEM2 format') - } - - - #Regex to find the values from the TrakEM2 .xml - transfExtractPatA = '.*.*' - transfExtractPatB = '.*class=\\"mpicbg.trakem2.transform.TranslationModel2D\\" data=\\"(.*?)\\" />.*' - #Split relevant text into numerical values - out <- c(sapply(strsplit(regmatches(x = inputstring, - m = regexec(pattern = transfExtractPatA, - text = inputstring))[[1]][2], - split = ' '), - function(x) as.numeric(x)), - sapply(strsplit(regmatches(x = inputstring, - m = regexec(pattern = transfExtractPatB, - text = inputstring))[[1]][2], - split = ' '), - function(x) as.numeric(x))) - - if(sum(is.na(out)) == 2) { - out = rep(0,5) - } - - out = c(out,0,0) - out = data.table::data.table(t(matrix(out))) - colnames(out) = c('Theta','Xtransform','Ytransform','itx','ity','XFinalTransform','YFinalTransform') - - #itx and ity are additional values in the trakem2 xml files that must be added to Xtransform and Ytransform in order to get the final transformation values. - #only relevant for sampleset with more than 1 slice away from the reference image - out$XFinalTransform = out$Xtransform + out$itx - out$YFinalTransform = out$Ytransform + out$ity - - #Multiply theta by -1 due to differences in R and image plotting coordinates - out$Theta <- -out$Theta - - return(out) -} - +.trakem2_rigid_transforms <- function(inputstring) { + # Catch wrong inputs + if (grepl("^.*trakem2.*", inputstring, ignore.case = TRUE) != 1) { + stop("xml files must be in TrakeEM2 format") + } -#' @title Rigid transform spatial locations -#' @name .rigid_transform_spatial_locations -#' @description Performs appropriate transforms to align spatial locations with registered images. -#' @param spatlocs input spatial locations -#' @param transform_values transformation values to use -#' @param method which method is used for image registration -#' @keywords internal -#Rotation is performed first, followed by XY transform. -.rigid_transform_spatial_locations = function(spatlocs, - transform_values, - method) { - if(method == 'fiji') { - spatlocsXY = spatlocs[,c('sdimx','sdimy')] - #These functions must be performed in positive y values - spatlocsXY$sdimy = -1 * spatlocsXY$sdimy + # Regex to find the values from the TrakEM2 .xml + transfExtractPatA <- '.*.*' + transfExtractPatB <- '.*class=\\"mpicbg.trakem2.transform.TranslationModel2D\\" data=\\"(.*?)\\" />.*' + # Split relevant text into numerical values + out <- c( + sapply( + strsplit( + regmatches( + x = inputstring, + m = regexec( + pattern = transfExtractPatA, + text = inputstring + ) + )[[1]][2], + split = " " + ), + function(x) as.numeric(x) + ), + sapply( + strsplit( + regmatches( + x = inputstring, + m = regexec( + pattern = transfExtractPatB, + text = inputstring + ) + )[[1]][2], + split = " " + ), + function(x) as.numeric(x) + ) + ) - spatlocsXY <- spin(spatlocsXY, GiottoUtils::degrees(transform_values$Theta)) %>% - spatShift(dx = transform_values$XFinalTransform, - dy = transform_values$YFinalTransform) + if (sum(is.na(out)) == 2) { + out <- rep(0, 5) + } - spatlocs$sdimx = spatlocsXY$sdimx - spatlocs$sdimy = -1 * spatlocsXY$sdimy + out <- c(out, 0, 0) + out <- data.table::data.table(t(matrix(out))) + colnames(out) <- c("Theta", "Xtransform", "Ytransform", "itx", "ity", + "XFinalTransform", "YFinalTransform") - return(spatlocs) + # itx and ity are additional values in the trakem2 xml files that must be + # added to Xtransform and Ytransform in order to get the final + # transformation values. + # only relevant for sampleset with more than 1 slice away from the + # reference image + out$XFinalTransform <- out$Xtransform + out$itx + out$YFinalTransform <- out$Ytransform + out$ity - } else if(method == 'rvision') { + # Multiply theta by -1 due to differences in R and image plotting + # coordinates + out$Theta <- -out$Theta - spatLocsXY = spatlocs[,c('sdimx','sdimy')] %>% - spin(GiottoUtils::degrees(acos(transform_values[1,1]))) %>% - spatShift(dx = -transform_values[1,3], - dy = -transform_values[2,3]) + return(out) +} - spatlocs$sdimx = spatLocsXY[,1] - spatlocs$sdimy = spatLocsXY[,2] - return(spatlocs) - } else { - stop('Image registration method must be provided. Only "fiji" and "rvision" methods currently supported.') - } +#' @title Rigid transform spatial locations +#' @name .rigid_transform_spatial_locations +#' @description Performs appropriate transforms to align spatial locations +#' with registered images. +#' @param spatlocs input spatial locations +#' @param transform_values transformation values to use +#' @param method which method is used for image registration +#' @keywords internal +# Rotation is performed first, followed by XY transform. +.rigid_transform_spatial_locations <- function(spatlocs, + transform_values, + method) { + if (method == "fiji") { + spatlocsXY <- spatlocs[, c("sdimx", "sdimy")] + # These functions must be performed in positive y values + spatlocsXY$sdimy <- -1 * spatlocsXY$sdimy + + spatlocsXY <- spin(spatlocsXY, GiottoUtils::degrees( + transform_values$Theta)) %>% + spatShift( + dx = transform_values$XFinalTransform, + dy = transform_values$YFinalTransform + ) + + spatlocs$sdimx <- spatlocsXY$sdimx + spatlocs$sdimy <- -1 * spatlocsXY$sdimy + + return(spatlocs) + } else if (method == "rvision") { + spatLocsXY <- spatlocs[, c("sdimx", "sdimy")] %>% + spin(GiottoUtils::degrees(acos(transform_values[1, 1]))) %>% + spatShift( + dx = -transform_values[1, 3], + dy = -transform_values[2, 3] + ) + + spatlocs$sdimx <- spatLocsXY[, 1] + spatlocs$sdimy <- spatLocsXY[, 2] + + return(spatlocs) + } else { + stop('Image registration method must be provided. Only "fiji" and + "rvision" methods currently supported.') + } } #' @title Find minmax of registered image @@ -105,102 +131,116 @@ #' @param transform_values transformation values to use #' @param method method of registration #' @keywords internal -#Automatically account for changes in image size due to alignment -.reg_img_minmax_finder = function(gobject_list, - image_unreg = NULL, - largeImage_unreg = NULL, #TODO Currently unused - scale_factor, - transform_values, - method) { - - - #Find image spatial info from original image if possible - #Check to make sure that image_unreg finds an existing image in each gobject to be registered - imgPresent = function(gobject, image, img_type) { - image %in% list_images_names(gobject = gobject, img_type = img_type) - } - - if(!is.null(image_unreg)) img_type = 'image' #TODO needs reworking - if(!is.null(largeImage_unreg)) img_type = 'largeImage' #TODO needs reworking - currently only pays attention to 'image' and not 'largeImage' types - - if(all(as.logical(lapply(X = gobject_list, FUN = imgPresent, image = image_unreg, img_type = img_type)))) { - - giottoImage_list = lapply(X = gobject_list, FUN = get_giottoImage, name = image_unreg, image_type = img_type) - image_corners = lapply(giottoImage_list, .get_img_corners) - - # Infer image corners of registered images PRIOR TO REGISTRATION - # scale unreg_image corners to registered image (use reg_scalefactor/unreg_scalefactor as scale factor) - image_corners = lapply_flex( - seq_along(gobject_list), - function(x) { - rescale( - image_corners[[x]], - (scale_factor[[x]]/giottoImage_list[[x]]@scale_factor), - x0 = 0, y0 = 0) - } - ) +# Automatically account for changes in image size due to alignment +.reg_img_minmax_finder <- function(gobject_list, + image_unreg = NULL, + largeImage_unreg = NULL, # TODO Currently unused + scale_factor, + transform_values, + method) { + # Find image spatial info from original image if possible + # Check to make sure that image_unreg finds an existing image in each + # gobject to be registered + imgPresent <- function(gobject, image, img_type) { + image %in% list_images_names(gobject = gobject, img_type = img_type) + } - # register corners based on transform values (only possible at reg_image scaling) - image_corners_reg = lapply( - seq_along(image_corners), - function(x) { - .rigid_transform_spatial_locations( - spatlocs = image_corners[[x]], - transform_values = transform_values[[x]], - method = method + if (!is.null(image_unreg)) img_type <- "image" # TODO needs reworking + if (!is.null(largeImage_unreg)) img_type <- "largeImage" # TODO needs + # reworking - currently only pays attention to 'image' and not + # 'largeImage' types + + if (all(as.logical(lapply( + X = gobject_list, FUN = imgPresent, image = image_unreg, + img_type = img_type)))) { + giottoImage_list <- lapply( + X = gobject_list, FUN = get_giottoImage, name = image_unreg, + image_type = img_type) + image_corners <- lapply(giottoImage_list, .get_img_corners) + + # Infer image corners of registered images PRIOR TO REGISTRATION + # scale unreg_image corners to registered image (use + # reg_scalefactor/unreg_scalefactor as scale factor) + image_corners <- lapply_flex( + seq_along(gobject_list), + function(x) { + rescale( + image_corners[[x]], + (scale_factor[[x]] / giottoImage_list[[x]]@scale_factor), + x0 = 0, y0 = 0 + ) + } ) - } - ) - # Return registered corners to spatloc scaling - image_corners_reg = lapply( - seq_along(image_corners_reg), - function(x) { - rescale(image_corners_reg[[x]], (1/scale_factor[[x]]), x0 = 0, y0 = 0) - } - ) + # register corners based on transform values (only possible at + # reg_image scaling) + image_corners_reg <- lapply( + seq_along(image_corners), + function(x) { + .rigid_transform_spatial_locations( + spatlocs = image_corners[[x]], + transform_values = transform_values[[x]], + method = method + ) + } + ) + + # Return registered corners to spatloc scaling + image_corners_reg <- lapply( + seq_along(image_corners_reg), + function(x) { + rescale( + image_corners_reg[[x]], (1 / scale_factor[[x]]), x0 = 0, y0 = 0) + } + ) - # combine list then find new image bound minmax - image_corners_reg = do.call(rbind, image_corners_reg) - minmaxRegVals = list('xmax_reg' = max(image_corners_reg$sdimx), - 'xmin_reg' = min(image_corners_reg$sdimx), - 'ymax_reg' = max(image_corners_reg$sdimy), - 'ymin_reg' = min(image_corners_reg$sdimy)) - - #return the minmax values - already scaled to spatlocs - return(minmaxRegVals) - } else { - warning('Original images must be supplied for registered images to be aligned.\n') - } + # combine list then find new image bound minmax + image_corners_reg <- do.call(rbind, image_corners_reg) + minmaxRegVals <- list( + "xmax_reg" = max(image_corners_reg$sdimx), + "xmin_reg" = min(image_corners_reg$sdimx), + "ymax_reg" = max(image_corners_reg$sdimy), + "ymin_reg" = min(image_corners_reg$sdimy) + ) + + # return the minmax values - already scaled to spatlocs + return(minmaxRegVals) + } else { + warning("Original images must be supplied for registered images to be + aligned.") + } } #' @title Get image corners #' @name .get_img_corners -#' @description finds four corner spatial coords of giottoImages or magick-images +#' @description finds four corner spatial coords of giottoImages or +#' magick-images #' @param img_object giottoImage or magick-image to use #' @keywords internal -.get_img_corners = function(img_object) { - if(methods::is(img_object,'giottoImage')) { - img_dims = get_img_minmax(img_object@mg_object) - } else if(methods::is(img_object,'magick-image')) { - img_dims = get_img_minmax(img_object) - } else { - stop('img_object must be either a giottoImage or a magick-image \n') - } - - upper_left = c(img_dims$img_xmin,img_dims$img_ymax) - lower_left = c(img_dims$img_xmin,img_dims$img_ymin) - lower_right = c(img_dims$img_xmax,img_dims$img_ymin) - upper_right = c(img_dims$img_xmax,img_dims$img_ymax) - - imageCorners = rbind(upper_left, - lower_left, - lower_right, - upper_right) - colnames(imageCorners) = c('sdimx','sdimy') - imageCorners = as.data.frame(imageCorners) - return(imageCorners) +.get_img_corners <- function(img_object) { + if (methods::is(img_object, "giottoImage")) { + img_dims <- get_img_minmax(img_object@mg_object) + } else if (methods::is(img_object, "magick-image")) { + img_dims <- get_img_minmax(img_object) + } else { + stop("img_object must be either a giottoImage or a magick-image") + } + + upper_left <- c(img_dims$img_xmin, img_dims$img_ymax) + lower_left <- c(img_dims$img_xmin, img_dims$img_ymin) + lower_right <- c(img_dims$img_xmax, img_dims$img_ymin) + upper_right <- c(img_dims$img_xmax, img_dims$img_ymax) + + imageCorners <- rbind( + upper_left, + lower_left, + lower_right, + upper_right + ) + colnames(imageCorners) <- c("sdimx", "sdimy") + imageCorners <- as.data.frame(imageCorners) + return(imageCorners) } @@ -209,416 +249,498 @@ #' @title registerGiottoObjectList #' @name registerGiottoObjectList -#' @description Wrapper function for registerGiottoObjectListFiji and registerGiottoObjectListRvision +#' @description Wrapper function for registerGiottoObjectListFiji and +#' registerGiottoObjectListRvision #' @param gobject_list List of gobjects to register #' @param spat_unit spatial unit -#' @param method Method used to align gobjects. Current options are either using FIJI register_virtual_stack_slices output or rvision +#' @param method Method used to align gobjects. Current options are either +#' using FIJI register_virtual_stack_slices output or rvision #' @param image_unreg Gobject image slot to use. Defaults to 'image' (optional) -#' @param image_reg_name Arbitrary image slot name for registered images to occupy. Defaults to replacement of 'image' slot (optional) +#' @param image_reg_name Arbitrary image slot name for registered images to +#' occupy. Defaults to replacement of 'image' slot (optional) #' @param image_list RVISION - under construction #' @param save_dir RVISION - under construction -#' @param spatloc_unreg Unregistered spatial locations to align. Defaults to 'raw' slot (optional) -#' @param spatloc_reg_name Arbitrary name for registered spatial locations. Defaults to replacement of 'raw' slot (optional) +#' @param spatloc_unreg Unregistered spatial locations to align. Defaults to +#' 'raw' slot (optional) +#' @param spatloc_reg_name Arbitrary name for registered spatial locations. +#' Defaults to replacement of 'raw' slot (optional) #' @param fiji_xml_files Filepaths to FIJI registration XML outputs -#' @param fiji_registered_images Registered images output by FIJI register_virtual_stack_slices +#' @param fiji_registered_images Registered images output by FIJI +#' register_virtual_stack_slices #' @param scale_factor Scaling to be applied to spatial coordinates -#' @param allow_rvision_autoscale Whether or not to allow rvision to automatically scale the images when performing image registration +#' @param allow_rvision_autoscale Whether or not to allow rvision to +#' automatically scale the images when performing image registration #' @param verbose Be verbose -#' @return List of registered giotto objects where the registered images and spatial locations +#' @return List of registered giotto objects where the registered images and +#' spatial locations #' @export -registerGiottoObjectList = function(gobject_list, - spat_unit = NULL, - method = c('fiji','rvision'), - image_unreg = 'image', - image_reg_name = 'image', - image_list = NULL, #Rvision - save_dir = NULL, #Rvision - spatloc_unreg = 'raw', - spatloc_reg_name = 'raw', - fiji_xml_files, - fiji_registered_images, - scale_factor = NULL, - allow_rvision_autoscale = TRUE, #Rvision - # auto_comp_reg_border = TRUE, - verbose = TRUE) { - - method = match.arg(method, choices = c('fiji','rvision')) - - if(method == 'fiji') { - gobject_list = registerGiottoObjectListFiji(gobject_list = gobject_list, - image_unreg = image_unreg, - image_reg_name = image_reg_name, - registered_images = fiji_registered_images, - spatloc_unreg = spatloc_unreg, - spatloc_reg_name = spatloc_reg_name, - xml_files = fiji_xml_files, - scale_factor = scale_factor, - # auto_comp_reg_border = auto_comp_reg_border, - verbose = verbose) - - } else if (method == 'rvision') { - gobject_list = registerGiottoObjectListRvision(gobject_list = gobject_list, - image_list = image_list, - save_dir = save_dir, - spatloc_unreg = spatloc_unreg, - spatloc_reg_name = spatloc_reg_name, - verbose = verbose) - - } else { - stop('Invalid method input\n Only fiji and rvision methods are currently supported.') - } - - return(gobject_list) +registerGiottoObjectList <- function(gobject_list, + spat_unit = NULL, + method = c("fiji", "rvision"), + image_unreg = "image", + image_reg_name = "image", + image_list = NULL, # Rvision + save_dir = NULL, # Rvision + spatloc_unreg = "raw", + spatloc_reg_name = "raw", + fiji_xml_files, + fiji_registered_images, + scale_factor = NULL, + allow_rvision_autoscale = TRUE, # Rvision + # auto_comp_reg_border = TRUE, + verbose = TRUE) { + method <- match.arg(method, choices = c("fiji", "rvision")) + + if (method == "fiji") { + gobject_list <- registerGiottoObjectListFiji( + gobject_list = gobject_list, + image_unreg = image_unreg, + image_reg_name = image_reg_name, + registered_images = fiji_registered_images, + spatloc_unreg = spatloc_unreg, + spatloc_reg_name = spatloc_reg_name, + xml_files = fiji_xml_files, + scale_factor = scale_factor, + # auto_comp_reg_border = auto_comp_reg_border, + verbose = verbose + ) + } else if (method == "rvision") { + gobject_list <- registerGiottoObjectListRvision( + gobject_list = gobject_list, + image_list = image_list, + save_dir = save_dir, + spatloc_unreg = spatloc_unreg, + spatloc_reg_name = spatloc_reg_name, + verbose = verbose + ) + } else { + stop("Invalid method input\n Only fiji and rvision methods are + currently supported.") + } + + return(gobject_list) } #' @title registerGiottoObjectListFiji #' @name registerGiottoObjectListFiji -#' @description Function to spatially align gobject data based on FIJI image registration. +#' @description Function to spatially align gobject data based on FIJI image +#' registration. #' @param gobject_list list of gobjects to register #' @param spat_unit spatial unit -#' @param image_unreg name of original unregistered images. Defaults to 'image' (optional) -#' @param image_reg_name arbitrary name for registered images to occupy. Defaults to replacement of 'image' (optional) -#' @param image_replace_name arbitrary name for any images replaced due to image_reg_name argument (optional) -#' @param registered_images registered images output by FIJI register_virtual_stack_slices +#' @param image_unreg name of original unregistered images. Defaults to +#' 'image' (optional) +#' @param image_reg_name arbitrary name for registered images to occupy. +#' Defaults to replacement of 'image' (optional) +#' @param image_replace_name arbitrary name for any images replaced due to +#' image_reg_name argument (optional) +#' @param registered_images registered images output by FIJI +#' register_virtual_stack_slices #' @param spatloc_unreg spatial locations to use. Defaults to 'raw' (optional) -#' @param spatloc_reg_name name for registered spatial locations. Defaults to replacement of 'raw' (optional) -#' @param spatloc_replace_name arbitrary name for any spatial locations replaced due to spatloc_reg_name argument (optional) -#' @param xml_files atomic vector of filepaths to xml outputs from FIJI register_virtual_stack_slices -#' @param scale_factor vector of scaling factors of images used in registration vs spatlocs +#' @param spatloc_reg_name name for registered spatial locations. Defaults to +#' replacement of 'raw' (optional) +#' @param spatloc_replace_name arbitrary name for any spatial locations +#' replaced due to spatloc_reg_name argument (optional) +#' @param xml_files atomic vector of filepaths to xml outputs from FIJI +#' register_virtual_stack_slices +#' @param scale_factor vector of scaling factors of images used in registration +#' vs spatlocs #' @param verbose be verbose -#' @return list of registered giotto objects where the registered images and spatial locations +#' @return list of registered giotto objects where the registered images and +#' spatial locations #' @export -registerGiottoObjectListFiji = function(gobject_list, - spat_unit = NULL, - image_unreg = 'image', - image_reg_name = 'image', - image_replace_name = 'unregistered', - registered_images = NULL, - spatloc_unreg = 'raw', - spatloc_reg_name = 'raw', - spatloc_replace_name = 'unregistered', - xml_files, - scale_factor = NULL, - verbose = TRUE) { - - # set spat_unit based on first gobject - spat_unit = set_default_spat_unit(gobject = gobject_list[[1]], - spat_unit = spat_unit) - - ## 0. Check Params ## - if(length(gobject_list) != length(xml_files)) { - stop('xml spatial transforms must be supplied for every gobject to be registered.\n') - } - - if(is.null(registered_images) == FALSE) { - # If there are not the same number of registered images as gobjects, stop - if(length(registered_images) != length(gobject_list)) { - stop('A registered image should be supplied for every gobject to align \n') - } - if(sum(as.logical(lapply(registered_images, methods::is, class2 = 'giottoImage'))) > 0) { - stop('Registered images should be supplied as either magick-objects or filepaths \n') - } - } +registerGiottoObjectListFiji <- function(gobject_list, + spat_unit = NULL, + image_unreg = "image", + image_reg_name = "image", + image_replace_name = "unregistered", + registered_images = NULL, + spatloc_unreg = "raw", + spatloc_reg_name = "raw", + spatloc_replace_name = "unregistered", + xml_files, + scale_factor = NULL, + verbose = TRUE) { + # set spat_unit based on first gobject + spat_unit <- set_default_spat_unit( + gobject = gobject_list[[1]], + spat_unit = spat_unit + ) - if(!is.null(scale_factor)) { - if(!is.numeric(scale_factor)) { - stop('scale_factor only accepts numerics') + ## 0. Check Params ## + if (length(gobject_list) != length(xml_files)) { + stop("xml spatial transforms must be supplied for every gobject to be + registered.") } - if((length(scale_factor) != length(gobject_list)) && (length(scale_factor) != 1)) { - stop('If more than one scale_factor is given, there must be one for each gobject to be registered. \n') - } - } - - - # scale_factors will always be given externally. Registered images do not have gobjects yet. - # expand scale_factor if given as a single value - scale_list = c() - if(length(scale_factor) == 1) { - scale_list = rep(scale_factor, length(gobject_list)) - } else { - scale_list = unlist(scale_factor) # ensure atomic vector - } - - - ## 1. Get spatial coordinates and put in lists # - spatloc_list = list() - for(gobj_i in seq_along(gobject_list)) { - gobj = gobject_list[[gobj_i]] - spatloc = get_spatial_locations(gobject = gobj, - spat_unit = spat_unit, - spat_loc_name = spatloc_unreg) - #------ Put all spatial location data together - spatloc_list[[gobj_i]] = spatloc - } - - - - ## 2. read transform xml files into list ## - transf_list = list() - for(file_i in seq_along(xml_files)) { - t_file = xml_files[[file_i]] - #------ Put all transform files together - transf_list[[file_i]] = paste(readLines(t_file, warn = F), collapse = '\n') - } - - # Select useful info out of the TrakEM2 files - transformsDF = lapply_flex(transf_list, .trakem2_rigid_transforms) - - - ## 3. apply transformation on spatial locations ## - # Scale by registered image's scale_factor - spatloc_list = lapply_flex( - seq_along(spatloc_list), - function(x) { - rescale(spatloc_list[[x]], scale_list[x], x0 = 0, y0 = 0) - } - ) - - # Register spatial locations - spatloc_list = lapply(seq_along(spatloc_list), - function(x) { - .rigid_transform_spatial_locations(spatlocs = spatloc_list[[x]], - transform_values = transformsDF[[x]], - method = 'fiji') - }) - - # Return scaling to spatloc original - spatloc_list = lapply_flex( - seq_along(spatloc_list), - function(x) { - rescale(spatloc_list[[x]], 1/(scale_list[x]), x0 = 0, y0 = 0) + + if (is.null(registered_images) == FALSE) { + # If there are not the same number of registered images as gobjects, + # stop + if (length(registered_images) != length(gobject_list)) { + stop("A registered image should be supplied for every gobject to + align") + } + if (sum(as.logical(lapply( + registered_images, methods::is, class2 = "giottoImage"))) > 0) { + stop("Registered images should be supplied as either magick-objects + or filepaths") + } } - ) - - ## 4. update Giotto slots and names and return list of Giotto object - - #Find new image boundaries for registered images - #Must have original pre-registration images in the gobject for this to work - #TODO (optional if just registering spatlocs) - reg_img_boundaries = .reg_img_minmax_finder(gobject_list = gobject_list, - image_unreg = image_unreg, - scale_factor = scale_list, - transform_values = transformsDF, - method = 'fiji') - - # Gobject updating for loop - for(gobj_i in seq_along(gobject_list)) { - gobj = gobject_list[[gobj_i]] - - - # Params check for conflicting names - if(verbose == TRUE) { - if(image_unreg == image_reg_name) { - cat('Registered image name already used. Previous image named ', image_reg_name,' renamed to ',image_replace_name,'. \n') - } - if(spatloc_unreg == spatloc_reg_name) { - cat('Registered spatloc name already used. Previous spatloc named ', spatloc_reg_name,' renamed to ', spatloc_replace_name,'. \n') - } + + if (!is.null(scale_factor)) { + if (!is.numeric(scale_factor)) { + stop("scale_factor only accepts numerics") + } + if ((length(scale_factor) != length(gobject_list)) && + (length(scale_factor) != 1)) { + stop("If more than one scale_factor is given, there must be one for + each gobject to be registered.") + } } - # Update Spatial - #Rename original spatial locations to 'unregistered' if conflicting with output - if(spatloc_unreg == spatloc_reg_name) { - gobj = set_spatial_locations(gobject = gobj, - spat_unit = spat_unit, - spat_loc_name = spatloc_replace_name, - spatlocs = get_spatial_locations(gobject = gobj, - spat_unit = spat_unit, - spat_loc_name = spatloc_unreg)) + # scale_factors will always be given externally. Registered images do not + # have gobjects yet. + # expand scale_factor if given as a single value + scale_list <- c() + if (length(scale_factor) == 1) { + scale_list <- rep(scale_factor, length(gobject_list)) + } else { + scale_list <- unlist(scale_factor) # ensure atomic vector } - #Assign registered spatial locations from spatloc_list to gobject_list - gobj = set_spatial_locations(gobject = gobj, - spat_unit = spat_unit, - spat_loc_name = spatloc_reg_name, - spatlocs = spatloc_list[[gobj_i]]) + ## 1. Get spatial coordinates and put in lists # + spatloc_list <- list() + for (gobj_i in seq_along(gobject_list)) { + gobj <- gobject_list[[gobj_i]] + spatloc <- get_spatial_locations( + gobject = gobj, + spat_unit = spat_unit, + spat_loc_name = spatloc_unreg + ) + #------ Put all spatial location data together + spatloc_list[[gobj_i]] <- spatloc + } - # Update images - #If there is an existing image with the image_reg_name, rename it "unregistered" - #Move the original image to 'unregistered' - if(image_unreg == image_reg_name) { - gobj@images[[image_replace_name]] = gobj@images[[image_unreg]] + ## 2. read transform xml files into list ## + transf_list <- list() + for (file_i in seq_along(xml_files)) { + t_file <- xml_files[[file_i]] + #------ Put all transform files together + transf_list[[file_i]] <- paste( + readLines(t_file, warn = FALSE), collapse = "\n") } + # Select useful info out of the TrakEM2 files + transformsDF <- lapply_flex(transf_list, .trakem2_rigid_transforms) - #Create a giotto image if there are registered images supplied - if(!is.null(registered_images)) { - g_image = createGiottoImage(gobject = gobj, - spat_unit = spat_unit, - spatial_locs = spatloc_reg_name, - mg_object = registered_images[[gobj_i]], - name = image_reg_name, - scale_factor = scale_list[[gobj_i]]) - #Add the registered image to the gobj. - gobj = addGiottoImageMG(gobject = gobj, - spat_unit = spat_unit, - spat_loc_name = spatloc_reg_name, - images = list(g_image)) + ## 3. apply transformation on spatial locations ## + # Scale by registered image's scale_factor + spatloc_list <- lapply_flex( + seq_along(spatloc_list), + function(x) { + rescale(spatloc_list[[x]], scale_list[x], x0 = 0, y0 = 0) + } + ) - #Automatic adjustment - if(exists('reg_img_boundaries')){ #TODO - im_info = gobj@images[[image_reg_name]]@minmax + # Register spatial locations + spatloc_list <- lapply( + seq_along(spatloc_list), + function(x) { + .rigid_transform_spatial_locations( + spatlocs = spatloc_list[[x]], + transform_values = transformsDF[[x]], + method = "fiji" + ) + } + ) - #update the giottoImage boundaries - boundaries <- as.numeric(c(reg_img_boundaries$xmax_reg - im_info[['xmax_sloc']], - im_info[['xmin_sloc']] - reg_img_boundaries$xmin_reg, - reg_img_boundaries$ymax_reg - im_info[['ymax_sloc']], - im_info[['ymin_sloc']] - reg_img_boundaries$ymin_reg)) + # Return scaling to spatloc original + spatloc_list <- lapply_flex( + seq_along(spatloc_list), + function(x) { + rescale(spatloc_list[[x]], 1 / (scale_list[x]), x0 = 0, y0 = 0) + } + ) - names(boundaries) = c('xmax_adj','xmin_adj','ymax_adj','ymin_adj') + ## 4. update Giotto slots and names and return list of Giotto object + + # Find new image boundaries for registered images + # Must have original pre-registration images in the gobject for this to work + # TODO (optional if just registering spatlocs) + reg_img_boundaries <- .reg_img_minmax_finder( + gobject_list = gobject_list, + image_unreg = image_unreg, + scale_factor = scale_list, + transform_values = transformsDF, + method = "fiji" + ) - gobj@images[[image_reg_name]]@boundaries = boundaries + # Gobject updating for loop + for (gobj_i in seq_along(gobject_list)) { + gobj <- gobject_list[[gobj_i]] + + + # Params check for conflicting names + if (verbose == TRUE) { + if (image_unreg == image_reg_name) { + cat("Registered image name already used. Previous image named ", + image_reg_name, " renamed to ", image_replace_name) + } + if (spatloc_unreg == spatloc_reg_name) { + cat("Registered spatloc name already used. + Previous spatloc named ", spatloc_reg_name, + " renamed to ", spatloc_replace_name) + } + } + + + # Update Spatial + # Rename original spatial locations to 'unregistered' if conflicting + # with output + if (spatloc_unreg == spatloc_reg_name) { + gobj <- set_spatial_locations( + gobject = gobj, + spat_unit = spat_unit, + spat_loc_name = spatloc_replace_name, + spatlocs = get_spatial_locations( + gobject = gobj, + spat_unit = spat_unit, + spat_loc_name = spatloc_unreg + ) + ) + } + + + # Assign registered spatial locations from spatloc_list to gobject_list + gobj <- set_spatial_locations( + gobject = gobj, + spat_unit = spat_unit, + spat_loc_name = spatloc_reg_name, + spatlocs = spatloc_list[[gobj_i]] + ) - } - } - gobject_list[[gobj_i]] = gobj - } # gobj update loop end - return(gobject_list) + + # Update images + # If there is an existing image with the image_reg_name, rename it + # "unregistered" + # Move the original image to 'unregistered' + if (image_unreg == image_reg_name) { + gobj@images[[image_replace_name]] <- gobj@images[[image_unreg]] + } + + + # Create a giotto image if there are registered images supplied + if (!is.null(registered_images)) { + g_image <- createGiottoImage( + gobject = gobj, + spat_unit = spat_unit, + spatial_locs = spatloc_reg_name, + mg_object = registered_images[[gobj_i]], + name = image_reg_name, + scale_factor = scale_list[[gobj_i]] + ) + + # Add the registered image to the gobj. + gobj <- addGiottoImageMG( + gobject = gobj, + spat_unit = spat_unit, + spat_loc_name = spatloc_reg_name, + images = list(g_image) + ) + + # Automatic adjustment + if (exists("reg_img_boundaries")) { # TODO + im_info <- gobj@images[[image_reg_name]]@minmax + + # update the giottoImage boundaries + boundaries <- as.numeric(c( + reg_img_boundaries$xmax_reg - im_info[["xmax_sloc"]], + im_info[["xmin_sloc"]] - reg_img_boundaries$xmin_reg, + reg_img_boundaries$ymax_reg - im_info[["ymax_sloc"]], + im_info[["ymin_sloc"]] - reg_img_boundaries$ymin_reg + )) + + names(boundaries) <- c( + "xmax_adj", "xmin_adj", "ymax_adj", "ymin_adj") + + gobj@images[[image_reg_name]]@boundaries <- boundaries + } + } + gobject_list[[gobj_i]] <- gobj + } # gobj update loop end + return(gobject_list) } -#TODO check if spatloc is actually provided in createGiottoImage() and ignore auto align if not. +# TODO check if spatloc is actually provided in createGiottoImage() and ignore +# auto align if not. #' @title registerGiottoObjectListRvision #' @name registerGiottoObjectListRvision -#' @description Function to spatially align gobject data based on Rvision image registration. +#' @description Function to spatially align gobject data based on Rvision image +#' registration. #' @param gobject_list list of gobjects to register #' @param image_list Filepaths to unregistered images #' @param save_dir (Optional) If given, save registered images to this directory #' @param spatloc_unreg spatial locations to use -#' @param spatloc_reg_name name for registered spatial locations to. Defaults to replacement of spat_unreg (optional) +#' @param spatloc_reg_name name for registered spatial locations to. Defaults +#' to replacement of spat_unreg (optional) #' @param verbose be verbose -#' @return list of registered giotto objects where the registered images and spatial locations +#' @return list of registered giotto objects where the registered images and +#' spatial locations #' @export -#Register giotto objects when given raw images and spatial locations -registerGiottoObjectListRvision = function(gobject_list = gobject_list, - image_list = NULL, - save_dir = NULL, - spatloc_unreg = NULL, - spatloc_reg_name = 'raw', - verbose = TRUE) { #Not used - - package_check(pkg_name = 'Rvision', - repository = c('github'), - github_repo = 'swarm-lab/Rvision') - - ## 1. get spatial coordinates and put in list ## - spatloc_list = list() - for(gobj_i in seq_along(gobject_list)) { - gobj = gobject_list[[gobj_i]] - spatloc = get_spatial_locations(gobject = gobj, - spat_loc_name = spatloc_unreg, - output = 'spatLocsObj', - copy_obj = TRUE) - # Put all spatial location data together - spatloc_list[[gobj_i]] = spatloc - } - - ## 2. Load images into list - if (length(spatloc_list) != length(image_list)) { - stop('images must be supplied for every gobject to be registered.') - } - - unreg_images <- c() - color_images <- c() - for (path in image_list) { - unreg_images <- append(unreg_images, Rvision::image(filename = path), after = length(unreg_images)) - color_images <- append(color_images, Rvision::image(filename = path), after = length(color_images)) - } - - ## 3. Perform preprocessing - rows <- c() - cols <- c() - for (image_i in seq_along(unreg_images)) { - # Make images grayscale - Rvision::changeColorSpace(unreg_images[[image_i]], colorspace = "GRAY", target = "self") - # Retrieve image dimensions - dims <- dim(unreg_images[[image_i]]) - rows <- append(rows, dims[[1]], after = length(rows)) - cols <- append(cols, dims[[2]], after = length(cols)) - } - maxes <- c(max(cols),max(rows)) - squmax <- max(maxes) - rm(dims, maxes) - - enddim <- 500 - for (i in seq_along(unreg_images)) { - # Add border so all images have same square dimensions - Rvision::border(unreg_images[[i]], squmax-rows[[i]], 0, squmax-cols[[i]], 0, border_color = "white", target = "self") - Rvision::border(color_images[[i]], squmax-rows[[i]], 0, squmax-cols[[i]], 0, border_color = "white", target = "self") - # Apply scaling so all images of reasonable size for processing - unreg_images[[i]] <- Rvision::resize(unreg_images[[i]], height = enddim, width = enddim, target = "new") - color_images[[i]] <- Rvision::resize(color_images[[i]], height = enddim, width = enddim, target = "new") - } - rm(cols,rows) - - ## 4. Compute transformations - # Choose reference image - refImage <- unreg_images[[base::floor(length(unreg_images)/2)]] - - # Compute ECC transforms - transfs <- base::vector(mode = "list", length = length(unreg_images)) - for (i in seq_along(unreg_images)) { - transfs[[i]] <- Rvision::findTransformECC(refImage, unreg_images[[i]], warp_mode = "euclidean", filt_size = 101) - } - rm(refImage) - - ## 5. Apply transform - reg_images <- c() - for (i in seq_along(unreg_images)) { - # Apply scaling - spatloc_list[[i]][] <- rescale(spatloc_list[[i]][], enddim/squmax, x0 = 0, y0 = 0) - # Apply transform to spatlocs - spatloc_list[[i]][] <- .rigid_transform_spatial_locations(spatloc_list[[i]][], transfs[[i]], method = 'rvision') - } - rm(squmax, enddim) - - ## 6. Update giotto object - for(gobj_i in seq_along(gobject_list)) { - gobj = gobject_list[[gobj_i]] - #Rename original spatial locations to 'unregistered' - - unreg_locs = get_spatial_locations(gobj, - spat_loc_name = spatloc_unreg, - copy_obj = FALSE, - output = 'spatLocsObj') - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobj = set_spatial_locations(gobj, - spatlocs = unreg_locs, - spat_loc_name = 'unregistered') - - #Assign registered spatial locations from spatloc_list to gobject_list - gobj = set_spatial_locations(gobj, - spatlocs = spatloc_list[[gobj_i]], - spat_loc_name = spatloc_reg_name) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - - gobject_list[[gobj_i]] = gobj - } - - ## 7. Save transformed images - if (!is.null(save_dir)) { - # Apply transform to image - transf_images <- c() +# Register giotto objects when given raw images and spatial locations +registerGiottoObjectListRvision <- function(gobject_list = gobject_list, + image_list = NULL, + save_dir = NULL, + spatloc_unreg = NULL, + spatloc_reg_name = "raw", + verbose = TRUE) { # Not used + + package_check( + pkg_name = "Rvision", + repository = c("github"), + github_repo = "swarm-lab/Rvision" + ) + + ## 1. get spatial coordinates and put in list ## + spatloc_list <- list() + for (gobj_i in seq_along(gobject_list)) { + gobj <- gobject_list[[gobj_i]] + spatloc <- get_spatial_locations( + gobject = gobj, + spat_loc_name = spatloc_unreg, + output = "spatLocsObj", + copy_obj = TRUE + ) + # Put all spatial location data together + spatloc_list[[gobj_i]] <- spatloc + } + + ## 2. Load images into list + if (length(spatloc_list) != length(image_list)) { + stop("images must be supplied for every gobject to be registered.") + } + + unreg_images <- c() + color_images <- c() + for (path in image_list) { + unreg_images <- append( + unreg_images, Rvision::image(filename = path), + after = length(unreg_images)) + color_images <- append( + color_images, Rvision::image(filename = path), + after = length(color_images)) + } + + ## 3. Perform preprocessing + rows <- c() + cols <- c() + for (image_i in seq_along(unreg_images)) { + # Make images grayscale + Rvision::changeColorSpace( + unreg_images[[image_i]], colorspace = "GRAY", target = "self") + # Retrieve image dimensions + dims <- dim(unreg_images[[image_i]]) + rows <- append(rows, dims[[1]], after = length(rows)) + cols <- append(cols, dims[[2]], after = length(cols)) + } + maxes <- c(max(cols), max(rows)) + squmax <- max(maxes) + rm(dims, maxes) + + enddim <- 500 for (i in seq_along(unreg_images)) { - transf_images <- append(transf_images, Rvision::warpAffine(color_images[[i]], transfs[[i]], target = "new"), length(transf_images)) + # Add border so all images have same square dimensions + Rvision::border( + unreg_images[[i]], squmax - rows[[i]], 0, + squmax - cols[[i]], 0, border_color = "white", target = "self") + Rvision::border( + color_images[[i]], squmax - rows[[i]], 0, + squmax - cols[[i]], 0, border_color = "white", target = "self") + # Apply scaling so all images of reasonable size for processing + unreg_images[[i]] <- Rvision::resize( + unreg_images[[i]], height = enddim, width = enddim, target = "new") + color_images[[i]] <- Rvision::resize( + color_images[[i]], height = enddim, width = enddim, target = "new") } - # Save images to save directory - for(image_i in seq_along(transf_images)) { - name <- paste(save_dir, image_i, ".jpg") - Rvision::write.Image(transf_images[[image_i]], name) + rm(cols, rows) + + ## 4. Compute transformations + # Choose reference image + refImage <- unreg_images[[base::floor(length(unreg_images) / 2)]] + + # Compute ECC transforms + transfs <- base::vector(mode = "list", length = length(unreg_images)) + for (i in seq_along(unreg_images)) { + transfs[[i]] <- Rvision::findTransformECC( + refImage, unreg_images[[i]], warp_mode = "euclidean", + filt_size = 101) + } + rm(refImage) + + ## 5. Apply transform + reg_images <- c() + for (i in seq_along(unreg_images)) { + # Apply scaling + spatloc_list[[i]][] <- rescale( + spatloc_list[[i]][], enddim / squmax, x0 = 0, y0 = 0) + # Apply transform to spatlocs + spatloc_list[[i]][] <- .rigid_transform_spatial_locations( + spatloc_list[[i]][], transfs[[i]], method = "rvision") } - } + rm(squmax, enddim) + + ## 6. Update giotto object + for (gobj_i in seq_along(gobject_list)) { + gobj <- gobject_list[[gobj_i]] + # Rename original spatial locations to 'unregistered' - return(gobject_list) + unreg_locs <- get_spatial_locations(gobj, + spat_loc_name = spatloc_unreg, + copy_obj = FALSE, + output = "spatLocsObj" + ) + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobj <- set_spatial_locations(gobj, + spatlocs = unreg_locs, + spat_loc_name = "unregistered" + ) + + # Assign registered spatial locations from spatloc_list to gobject_list + gobj <- set_spatial_locations(gobj, + spatlocs = spatloc_list[[gobj_i]], + spat_loc_name = spatloc_reg_name + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + gobject_list[[gobj_i]] <- gobj + } + + ## 7. Save transformed images + if (!is.null(save_dir)) { + # Apply transform to image + transf_images <- c() + for (i in seq_along(unreg_images)) { + transf_images <- append(transf_images, Rvision::warpAffine( + color_images[[i]], transfs[[i]], target = "new"), + length(transf_images)) + } + # Save images to save directory + for (image_i in seq_along(transf_images)) { + name <- paste(save_dir, image_i, ".jpg") + Rvision::write.Image(transf_images[[image_i]], name) + } + } + + return(gobject_list) } ### FIJI related functions #### @@ -640,41 +762,51 @@ registerGiottoObjectListRvision = function(gobject_list = gobject_list, #' # you can also set options(giotto.fiji="/some/path") #' fiji("/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx") #' } -fiji = function(fijiPath = NULL) { - if(!is.null(fijiPath)) { - if(!file.exists(fijiPath)) - stop("fiji is not at: ", fijiPath) - } else { - # do we have an option set? - fijiPath=getOption('giotto.fiji') - if(!is.null(fijiPath)) { - if(!file.exists(fijiPath)) - stop("fiji is not at: ", fijiPath, " as specified by options('giotto.fiji')!") +fiji <- function(fijiPath = NULL) { + if (!is.null(fijiPath)) { + if (!file.exists(fijiPath)) { + stop("fiji is not at: ", fijiPath) + } } else { - # look for it in sensible places - if(!nzchar(fijiPath <- Sys.which('fiji'))) { - macapp="/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx" - if(file.exists(macapp)) - fijiPath = macapp - else - stop("Unable to find fiji!", - "Set options('giotto.fiji') to point to the fiji command line executable!") - } + # do we have an option set? + fijiPath <- getOption("giotto.fiji") + if (!is.null(fijiPath)) { + if (!file.exists(fijiPath)) { + stop("fiji is not at: ", fijiPath, + " as specified by options('giotto.fiji')!") + } + } else { + # look for it in sensible places + if (!nzchar(fijiPath <- Sys.which("fiji"))) { + macapp <- "/Applications/Fiji.app/Contents/MacOS/ImageJ-macosx" + if (file.exists(macapp)) { + fijiPath <- macapp + } else { + stop( + "Unable to find fiji!", + "Set options('giotto.fiji') to point to the fiji + command line executable!" + ) + } + } + } } - } - fijiPath = normalizePath(fijiPath) - options(giotto.fiji=fijiPath) - fijiPath + fijiPath <- normalizePath(fijiPath) + options(giotto.fiji = fijiPath) + fijiPath } #' @title registerImagesFIJI #' @name registerImagesFIJI -#' @description Wrapper function for Register Virtual Stack Slices plugin in FIJI +#' @description Wrapper function for Register Virtual Stack Slices plugin in +#' FIJI #' @param source_img_dir Folder containing images to be registered #' @param output_img_dir Folder to save registered images to -#' @param transforms_save_dir (jython implementation only) Folder to save transforms to -#' @param ref_img_name (jython implementation only) File name of reference image for the registration +#' @param transforms_save_dir (jython implementation only) Folder to save +#' transforms to +#' @param ref_img_name (jython implementation only) File name of reference +#' image for the registration #' @param init_gauss_blur Point detector option: initial image blurring #' @param steps_per_scale_octave Point detector option #' @param min_img_size Point detector option @@ -697,126 +829,138 @@ fiji = function(fijiPath = NULL) { #' \code{options(giotto.fiji="/some/path")}) #' @param DryRun Whether to return the command to be run rather than actually #' executing it. -#' @return list of registered giotto objects where the registered images and spatial locations -#' @details This function was adapted from runFijiMacro function in jimpipeline by jefferislab +#' @return list of registered giotto objects where the registered images and +#' spatial locations +#' @details This function was adapted from runFijiMacro function in +#' jimpipeline by jefferislab #' #' @export -registerImagesFIJI = function(source_img_dir, - output_img_dir, - transforms_save_dir, - ref_img_name, - #Scale Invariant Interest Point Detector Options - init_gauss_blur = 1.6, - steps_per_scale_octave = 3, - min_img_size = 64, - max_img_size = 1024, - #Feature Descriptor Options - feat_desc_size = 8, - feat_desc_orient_bins = 8, - closest_next_closest_ratio = 0.92, - #Geometric Consensus Filter Options - max_align_err = 25, - inlier_ratio = 0.05, - #FIJI Options - headless = FALSE, - batch = TRUE, - MinMem = MaxMem, - MaxMem = 2500, - IncrementalGC = TRUE, - Threads = NULL, - fijiArgs = NULL, - javaArgs = NULL, - ijArgs = NULL, - jython = FALSE, - fijiPath = fiji(), - DryRun = FALSE) { - - #Check if output directory exists. If not, create the directory - if(!file.exists(output_img_dir)) { - dir.create(output_img_dir) - } - - #expand the paths of source and output - source_img_dir = path.expand(source_img_dir) - output_img_dir = path.expand(output_img_dir) - - - if(headless) fijiArgs = c(fijiArgs,"--headless") - fijiArgs=paste(fijiArgs,collapse=" ") - - javaArgs=c(paste("-Xms",MinMem,'m',sep=""),paste("-Xmx",MaxMem,'m',sep=""),javaArgs) - if(IncrementalGC) javaArgs=c(javaArgs,"-Xincgc") - javaArgs=paste(javaArgs,collapse=" ") - - threadAdjust=ifelse(is.null(Threads),"",paste("run(\"Memory & Threads...\", \"parallel=",Threads,"\");",sep="")) - - if(jython == TRUE) { - #TODO Add check to see if jython script is installed. - cat('jython implementation requires Headless_RVSS.py in "/Giotto/inst/fiji/" to be copied to "/Applications/Fiji.app/plugins/Scripts/MyScripts/Headless_RVSS.py" \n') - - macroCall=paste(" -eval '", - threadAdjust, - "run(\"Headless RVSS\", \"source_dir=[", - source_img_dir, - "] target_dir=[", - output_img_dir, - "] transf_dir=[", - transforms_save_dir, - "] reference_name=[", - ref_img_name, - "] init_gauss_blur=", - init_gauss_blur, - " steps_per_scale_octave=", - steps_per_scale_octave, - " min_img_size=", - min_img_size, - " max_img_size=", - max_img_size, - " feat_desc_size=", - feat_desc_size, - " feat_desc_orient_bins=", - feat_desc_orient_bins, - " closest_next_closest_ratio=", - closest_next_closest_ratio, - " max_align_err=", - max_align_err, - " minInlierRatio=", - inlier_ratio, - " interpolate=TRUE\");' ", - sep="") - } else { - macroCall=paste(" -eval '", - threadAdjust, - "run(\"Register Virtual Stack Slices\", \"source=[", - source_img_dir, - "] output=[", - output_img_dir, - "] feature=Similarity registration=[Rigid -- translate + rotate ] advanced save initial_gaussian_blur=", - init_gauss_blur, - " steps_per_scale_octave=", - steps_per_scale_octave, - " minimum_image_size=", - min_img_size, - " maximum_image_size=", - max_img_size, - " feature_descriptor_size=", - feat_desc_size, - " feature_descriptor_orientation_bins=", - feat_desc_orient_bins, - " closest/next_closest_ratio=", - closest_next_closest_ratio, - " maximal_alignment_error=", - max_align_err, - " inlier_ratio=", - inlier_ratio, - " feature_extraction_model=Similarity registration_model=[Rigid -- translate + rotate ] interpolate\");' ",sep="") - } - - ijArgs=paste(c(ijArgs,ifelse(batch,"-batch","")),collapse=" ") - - cmd<-paste(fijiPath,javaArgs,fijiArgs,"--",macroCall,ijArgs) - if(DryRun) return(cmd) - return(0==system(cmd)) +registerImagesFIJI <- function(source_img_dir, + output_img_dir, + transforms_save_dir, + ref_img_name, + # Scale Invariant Interest Point Detector Options + init_gauss_blur = 1.6, + steps_per_scale_octave = 3, + min_img_size = 64, + max_img_size = 1024, + # Feature Descriptor Options + feat_desc_size = 8, + feat_desc_orient_bins = 8, + closest_next_closest_ratio = 0.92, + # Geometric Consensus Filter Options + max_align_err = 25, + inlier_ratio = 0.05, + # FIJI Options + headless = FALSE, + batch = TRUE, + MinMem = MaxMem, + MaxMem = 2500, + IncrementalGC = TRUE, + Threads = NULL, + fijiArgs = NULL, + javaArgs = NULL, + ijArgs = NULL, + jython = FALSE, + fijiPath = fiji(), + DryRun = FALSE) { + # Check if output directory exists. If not, create the directory + if (!file.exists(output_img_dir)) { + dir.create(output_img_dir) + } + + # expand the paths of source and output + source_img_dir <- path.expand(source_img_dir) + output_img_dir <- path.expand(output_img_dir) + + + if (headless) fijiArgs <- c(fijiArgs, "--headless") + fijiArgs <- paste(fijiArgs, collapse = " ") + + javaArgs <- c(paste("-Xms", MinMem, "m", sep = ""), + paste("-Xmx", MaxMem, "m", sep = ""), javaArgs) + if (IncrementalGC) javaArgs <- c(javaArgs, "-Xincgc") + javaArgs <- paste(javaArgs, collapse = " ") + + threadAdjust <- ifelse( + is.null(Threads), "", + paste("run(\"Memory & Threads...\", \"parallel=", Threads, "\");", + sep = "")) + + if (jython == TRUE) { + # TODO Add check to see if jython script is installed. + message('jython implementation requires Headless_RVSS.py in + "/Giotto/inst/fiji/" to be copied to + "/Applications/Fiji.app/plugins/Scripts/MyScripts/Headless_RVSS.py"') + + macroCall <- paste(" -eval '", + threadAdjust, + "run(\"Headless RVSS\", \"source_dir=[", + source_img_dir, + "] target_dir=[", + output_img_dir, + "] transf_dir=[", + transforms_save_dir, + "] reference_name=[", + ref_img_name, + "] init_gauss_blur=", + init_gauss_blur, + " steps_per_scale_octave=", + steps_per_scale_octave, + " min_img_size=", + min_img_size, + " max_img_size=", + max_img_size, + " feat_desc_size=", + feat_desc_size, + " feat_desc_orient_bins=", + feat_desc_orient_bins, + " closest_next_closest_ratio=", + closest_next_closest_ratio, + " max_align_err=", + max_align_err, + " minInlierRatio=", + inlier_ratio, + " interpolate=TRUE\");' ", + sep = "" + ) + } else { + macroCall <- paste(" -eval '", + threadAdjust, + "run(\"Register Virtual Stack Slices\", \"source=[", + source_img_dir, + "] output=[", + output_img_dir, + "] feature=Similarity registration=[Rigid -- translate + rotate ] advanced save initial_gaussian_blur=", + init_gauss_blur, + " steps_per_scale_octave=", + steps_per_scale_octave, + " minimum_image_size=", + min_img_size, + " maximum_image_size=", + max_img_size, + " feature_descriptor_size=", + feat_desc_size, + " feature_descriptor_orientation_bins=", + feat_desc_orient_bins, + " closest/next_closest_ratio=", + closest_next_closest_ratio, + " maximal_alignment_error=", + max_align_err, + " inlier_ratio=", + inlier_ratio, + " feature_extraction_model=Similarity registration_model=[Rigid -- translate + rotate ] interpolate\");' ", + sep = "" + ) + } + + ijArgs <- paste(c(ijArgs, ifelse(batch, "-batch", "")), collapse = " ") + + cmd <- paste(fijiPath, javaArgs, fijiArgs, "--", macroCall, ijArgs) + if (DryRun) { + return(cmd) + } + return(0 == system(cmd)) } @@ -832,42 +976,43 @@ registerImagesFIJI = function(source_img_dir, #' @return a list of transforms information. #' @keywords internal parse_affine <- function(x) { - x <- as.matrix(x) - scale_x <- x[[1, 1]] - shear_x <- x[[1, 2]] - translate_x <- x[[1, 3]] - scale_y <- x[[2, 2]] - shear_y <- x[[2, 1]] - translate_y <- x[[2, 3]] - - list( - scale = c(x = scale_x, y = scale_y), - rotate = atan(shear_x/scale_x) + atan(shear_y/scale_y), - shear = c(x = shear_x, y = shear_y), - translate = c(x = translate_x, y = translate_y) - ) + x <- as.matrix(x) + scale_x <- x[[1, 1]] + shear_x <- x[[1, 2]] + translate_x <- x[[1, 3]] + scale_y <- x[[2, 2]] + shear_y <- x[[2, 1]] + translate_y <- x[[2, 3]] + + list( + scale = c(x = scale_x, y = scale_y), + rotate = atan(shear_x / scale_x) + atan(shear_y / scale_y), + shear = c(x = shear_x, y = shear_y), + translate = c(x = translate_x, y = translate_y) + ) } -#TODO - merge jython function into normal register FIJI -#TODO - add in manual rigid registration when given a transforms table +# TODO - merge jython function into normal register FIJI +# TODO - add in manual rigid registration when given a transforms table ### Under Construction #### # resizeImagesFIJI = function(fiji = fiji()) {} -#TODO - install FIJI jython registration and resize scripts +# TODO - install FIJI jython registration and resize scripts # install_FIJI_scripts = function(fiji = fiji()) {} -#TODO These things require a correct set of boundary values -# - Subset images in Giotto using Magick and followup reassignment as the default 'image' +# TODO These things require a correct set of boundary values +# - Subset images in Giotto using Magick and followup reassignment as the +# default 'image' # - Follow this up with potential registration -# - Need a way to determine the pixel distances between spots to get an idea of which regions of image 'belong' to a spot -# - Would be nice to be able to put together an image mask even in magick and apply it to the image to aid with img_reg and take care of jagged lines after image subsetting +# - Need a way to determine the pixel distances between spots to get an idea of +# which regions of image 'belong' to a spot +# - Would be nice to be able to put together an image mask even in magick and +# apply it to the image to aid with img_reg and take care of jagged lines after +# image subsetting # - A shiny app to subset tissue regions would be nice # The shiny app should be able to select spots in a 2d plane by default -# If given the ability, it should also select spots of a single plane or within a certain range of z values and plot them as a 2D for selection purposes - - - - +# If given the ability, it should also select spots of a single plane or within +# a certain range of z values and plot them as a 2D for selection purposes diff --git a/R/interactivity.R b/R/interactivity.R index 21987bb2d..4703299fe 100644 --- a/R/interactivity.R +++ b/R/interactivity.R @@ -1,9 +1,9 @@ - # ** interactive plotting #### #' Select image regions by plotting interactive polygons #' -#' @description Plot interactive polygons on an image and retrieve the polygons coordinates. +#' @description Plot interactive polygons on an image and retrieve the polygons +#' coordinates. #' @param x A `ggplot` or `rast` plot object to draw polygons on #' @param width,height An integer, defining the width/height in pixels. #' @param ... Graphical parameters passed on to `polygon` or `geom_point`. @@ -12,93 +12,124 @@ #' #' @export plotInteractivePolygons <- function(x, - width = "auto", - height = "auto", - ...) { - - package_check(pkg_name = 'miniUI', repository = 'CRAN') - package_check(pkg_name = 'shiny', repository = 'CRAN') - - # data.table variables - y = name = NULL - - if(is.null(x)) stop('plot object is empty') - - ## find min and max values for spatRaster image - if("SpatRaster" %in% class(x)) { - ui <- miniUI::miniPage( - miniUI::gadgetTitleBar("Plot Interactive Polygons"), - miniUI::miniContentPanel( - shiny::textInput("polygon_name", label = "Polygon name", value = "polygon 1"), - shiny::sliderInput("xrange", label = "x coordinates", - min = min(terra::ext(x))[1], - max = max(terra::ext(x))[1], - value = c(min(terra::ext(x))[1], - max(terra::ext(x))[1])) , - shiny::sliderInput("yrange", label = "y coordinates", - min = min(terra::ext(x))[2], - max = max(terra::ext(x))[2], - value = c(min(terra::ext(x))[2], - max(terra::ext(x))[2])) , - shiny::plotOutput("plot", click = "plot_click") - ) - ) + width = "auto", + height = "auto", + ...) { + package_check(pkg_name = "miniUI", repository = "CRAN") + package_check(pkg_name = "shiny", repository = "CRAN") + + # data.table variables + y <- name <- NULL + + if (is.null(x)) stop("plot object is empty") + + ## find min and max values for spatRaster image + if ("SpatRaster" %in% class(x)) { + ui <- miniUI::miniPage( + miniUI::gadgetTitleBar("Plot Interactive Polygons"), + miniUI::miniContentPanel( + shiny::textInput( + "polygon_name", label = "Polygon name", + value = "polygon 1"), + shiny::sliderInput("xrange", + label = "x coordinates", + min = min(terra::ext(x))[1], + max = max(terra::ext(x))[1], + value = c( + min(terra::ext(x))[1], + max(terra::ext(x))[1] + ) + ), + shiny::sliderInput("yrange", + label = "y coordinates", + min = min(terra::ext(x))[2], + max = max(terra::ext(x))[2], + value = c( + min(terra::ext(x))[2], + max(terra::ext(x))[2] + ) + ), + shiny::plotOutput("plot", click = "plot_click") + ) + ) + } else { ## find min and max values for non-spatRaster image + ui <- miniUI::miniPage( + miniUI::gadgetTitleBar("Plot Interactive Polygons"), + miniUI::miniContentPanel( + shiny::textInput( + "polygon_name", label = "Polygon name", + value = "polygon 1"), + shiny::sliderInput("xrange", + label = "x coordinates", + min = min(x[["layers"]][[1]]$data$sdimx), + max = max(x[["layers"]][[1]]$data$sdimx), + value = c( + min(x[["layers"]][[1]]$data$sdimx), + max(x[["layers"]][[1]]$data$sdimx) + ) + ), + shiny::sliderInput("yrange", + label = "y coordinates", + min = min(x[["layers"]][[1]]$data$sdimy), + max = max(x[["layers"]][[1]]$data$sdimy), + value = c( + min(x[["layers"]][[1]]$data$sdimy), + max(x[["layers"]][[1]]$data$sdimy) + ) + ), + shiny::plotOutput("plot", click = "plot_click") + ) + ) + } - } else { ## find min and max values for non-spatRaster image - ui <- miniUI::miniPage( - miniUI::gadgetTitleBar("Plot Interactive Polygons"), - miniUI::miniContentPanel( - shiny::textInput("polygon_name", label = "Polygon name", value = "polygon 1"), - shiny::sliderInput("xrange", label = "x coordinates", - min = min(x[["layers"]][[1]]$data$sdimx), - max = max(x[["layers"]][[1]]$data$sdimx), - value = c(min(x[["layers"]][[1]]$data$sdimx), - max(x[["layers"]][[1]]$data$sdimx))) , - shiny::sliderInput("yrange", label = "y coordinates", - min = min(x[["layers"]][[1]]$data$sdimy), - max = max(x[["layers"]][[1]]$data$sdimy), - value = c(min(x[["layers"]][[1]]$data$sdimy), - max(x[["layers"]][[1]]$data$sdimy))) , - shiny::plotOutput("plot", click = "plot_click") - ) - ) - } + server <- function(input, output, session) { + output$plot <- shiny::renderPlot( + { + if ("ggplot" %in% class(x)) { + x$coordinates$default <- TRUE + x + + geom_polygon( + data = clicklist(), + aes(x, y, color = name, fill = name), + alpha = 0, ... + ) + + coord_fixed( + xlim = c(input$xrange[1], input$xrange[2]), + ylim = c(input$yrange[1], input$yrange[2]) + ) + + theme(legend.position = "none") + } else { + terra::plot(x) + lapply(split(clicklist(), by = "name"), + function(x) graphics::polygon(x$x, x$y, ...)) + } + }, + res = 96, + width = width, + height = height + ) - server <- function(input, output,session) { - output$plot <- shiny::renderPlot({ - if ("ggplot" %in% class(x)) { - x$coordinates$default <- TRUE - x + - geom_polygon(data = clicklist(), aes(x,y, color = name, fill = name), - alpha = 0, ...) + - coord_fixed(xlim = c(input$xrange[1], input$xrange[2]), - ylim = c(input$yrange[1], input$yrange[2])) + - theme(legend.position = 'none') - } else { - terra::plot(x) - lapply(split(clicklist(), by = "name"), function (x) graphics::polygon(x$x, x$y, ...) ) - } - }, res = 96, width = width, height = height) - - clicklist <- shiny::reactiveVal(data.table::data.table(x = numeric(), y = numeric(), name = character())) # empty table - shiny::observeEvent(input$plot_click, { - click_x <- input$plot_click$x - click_y <- input$plot_click$y - polygon_name <- input$polygon_name - temp <- clicklist() # get the table of past clicks - temp <- rbind(temp,data.table::data.table(x = click_x, y = click_y, name = polygon_name)) - clicklist(temp) - }) - - - output$info <- shiny::renderTable(clicklist()) - - shiny::observeEvent(input$done, { - returnValue <- clicklist() - shiny::stopApp(returnValue) - }) - } - shiny::runGadget(ui, server) + clicklist <- shiny::reactiveVal(data.table::data.table( + x = numeric(), y = numeric(), name = character())) # empty table + shiny::observeEvent(input$plot_click, { + click_x <- input$plot_click$x + click_y <- input$plot_click$y + polygon_name <- input$polygon_name + temp <- clicklist() # get the table of past clicks + temp <- rbind(temp, data.table::data.table( + x = click_x, y = click_y, name = polygon_name)) + clicklist(temp) + }) + + + output$info <- shiny::renderTable(clicklist()) + + shiny::observeEvent(input$done, { + returnValue <- clicklist() + shiny::stopApp(returnValue) + }) + } + shiny::runGadget(ui, server) } @@ -108,26 +139,31 @@ plotInteractivePolygons <- function(x, #' @param polygon_name name of polygon selections #' @param spat_unit spatial unit, default = 'cell' #' @param spat_loc_name name of spatial locations to use, default = 'raw' -#' @param polygons character. A vector with polygon names to extract cells from. If NULL, cells from all polygons are retrieved +#' @param polygons character. A vector with polygon names to extract cells +#' from. If NULL, cells from all polygons are retrieved #' -#' @return A terra 'SpatVector' with cell ID, x y coordinates, and polygon ID where each cell is located in. +#' @return A terra 'SpatVector' with cell ID, x y coordinates, and polygon ID +#' where each cell is located in. #' #' @export #' #' @examples -#' #' \dontrun{ #' ## Plot interactive polygons -#' my_spatPlot <- spatPlot2D(gobject = my_giotto_object, -#' show_image = TRUE, -#' point_alpha = 0.75, -#' save_plot = FALSE) +#' my_spatPlot <- spatPlot2D( +#' gobject = my_giotto_object, +#' show_image = TRUE, +#' point_alpha = 0.75, +#' save_plot = FALSE +#' ) #' my_polygon_coords <- plotInteractivePolygons(my_spatPlot) #' #' ## Add polygon coordinates to Giotto object #' my_giotto_polygons <- createGiottoPolygonsFromDfr(my_polygon_coords) -#' my_giotto_object <- addGiottoPolygons(gobject = my_giotto_object, -#' gpolygons = list(my_giotto_polygons)) +#' my_giotto_object <- addGiottoPolygons( +#' gobject = my_giotto_object, +#' gpolygons = list(my_giotto_polygons) +#' ) #' #' ## Get cells located within polygons area #' getCellsFromPolygon(my_giotto_object) @@ -137,40 +173,44 @@ plotInteractivePolygons <- function(x, #' } #' getCellsFromPolygon <- function(gobject, - polygon_name = 'selections', - spat_unit = "cell", - spat_loc_name = 'raw', - polygons = NULL) { - - if (!inherits(gobject, "giotto")) { - stop("gobject needs to be a giotto object") - } - - ## get polygons spatial info - polygon_spatVector = get_polygon_info(gobject = gobject, - polygon_name = polygon_name, - return_giottoPolygon = FALSE) - # polygon_spatVector <- slot(slot(gobject, "spatial_info")[[spat_unit]], "spatVector") - - ## get cell spatial locations - spatial_locs <- get_spatial_locations(gobject = gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name, - output = 'data.table', - copy_obj = TRUE) - - ## convert cell spatial locations to spatVector - cells_spatVector <- terra::vect(as.matrix(spatial_locs[,1:2]), - type = "points", - atts = spatial_locs) - - polygonCells <- terra::intersect(cells_spatVector, polygon_spatVector) - - if(!is.null(polygons)) { - polygonCells <- terra::subset(polygonCells, polygonCells$poly_ID %in% polygons) - } - - return(polygonCells) + polygon_name = "selections", + spat_unit = "cell", + spat_loc_name = "raw", + polygons = NULL) { + if (!inherits(gobject, "giotto")) { + stop("gobject needs to be a giotto object") + } + + ## get polygons spatial info + polygon_spatVector <- get_polygon_info( + gobject = gobject, + polygon_name = polygon_name, + return_giottoPolygon = FALSE + ) + + ## get cell spatial locations + spatial_locs <- get_spatial_locations( + gobject = gobject, + spat_unit = spat_unit, + spat_loc_name = spat_loc_name, + output = "data.table", + copy_obj = TRUE + ) + + ## convert cell spatial locations to spatVector + cells_spatVector <- terra::vect(as.matrix(spatial_locs[, 1:2]), + type = "points", + atts = spatial_locs + ) + + polygonCells <- terra::intersect(cells_spatVector, polygon_spatVector) + + if (!is.null(polygons)) { + polygonCells <- terra::subset( + polygonCells, polygonCells$poly_ID %in% polygons) + } + + return(polygonCells) } @@ -181,82 +221,93 @@ getCellsFromPolygon <- function(gobject, #' @param feat_type feature name where metadata will be added #' @param spat_unit spatial unit #' @param spat_loc_name name of spatial locations to use -#' @param polygons polygon names to plot (e.g. 'polygon_1'). If NULL, plots all available polygons -#' @param na.label polygon label for cells located outside of polygons area. Default = "no_polygon" +#' @param polygons polygon names to plot (e.g. 'polygon_1'). If NULL, plots +#' all available polygons +#' @param na.label polygon label for cells located outside of polygons area. +#' Default = "no_polygon" #' #' @return A Giotto object with a modified cell_metadata slot that includes the -#' polygon name where each cell is located or no_polygon label if the cell is not located -#' within a polygon area +#' polygon name where each cell is located or no_polygon label if the cell is +#' not located within a polygon area #' #' @export #' #' @examples -#' #' \dontrun{ #' ## Plot interactive polygons #' my_polygon_coords <- plotInteractivePolygons(my_spatPlot) #' #' ## Add polygon coordinates to Giotto object #' my_giotto_polygons <- createGiottoPolygonsFromDfr(my_polygon_coords) -#' my_giotto_object <- addGiottoPolygons(gobject = my_giotto_object, -#' gpolygons = list(my_giotto_polygons)) +#' my_giotto_object <- addGiottoPolygons( +#' gobject = my_giotto_object, +#' gpolygons = list(my_giotto_polygons) +#' ) #' #' ## Add polygon IDs to cell metadata #' my_giotto_object <- addPolygonCells(my_giotto_object) #' } #' addPolygonCells <- function(gobject, - polygon_name = 'selections', - spat_unit = "cell", - spat_loc_name = 'raw', - feat_type = "rna", - polygons = NULL, - na.label = "no_polygon") { - - ## verify gobject - if (!inherits(gobject, "giotto")) { - stop("gobject needs to be a giotto object") - } - - ## get cells within each polygon - polygon_cells <- data.table::as.data.table(getCellsFromPolygon(gobject = gobject, - polygon_name = polygon_name, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name, - polygons = polygons)) - data.table::setnames(polygon_cells, old = 'poly_ID', new = polygon_name) - - ## get original cell metadata - cell_metadata <- getCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - - - ## add polygon ID to cell metadata - polygon_cells = polygon_cells[,c("cell_ID", polygon_name), with = FALSE] - new_cell_metadata <- data.table::merge.data.table(x = cell_metadata, - y = polygon_cells, - by = "cell_ID", all.x = TRUE) - - ## assign a default ID to cells outside of polygons - selection_values = new_cell_metadata[[polygon_name]] - selection_values = ifelse(is.na(selection_values), na.label, selection_values) - new_cell_metadata[, c(polygon_name) := selection_values] - - #new_cell_metadata[is.na(new_cell_metadata$poly_ID), "poly_ID"] <- na.label - - ## keep original order of cells - new_cell_metadata <- new_cell_metadata[match(cell_metadata$cell_ID, - new_cell_metadata$cell_ID), ] - - gobject <- addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = new_cell_metadata[,-1]) - - return(gobject) + polygon_name = "selections", + spat_unit = "cell", + spat_loc_name = "raw", + feat_type = "rna", + polygons = NULL, + na.label = "no_polygon") { + ## verify gobject + if (!inherits(gobject, "giotto")) { + stop("gobject needs to be a giotto object") + } + + ## get cells within each polygon + polygon_cells <- data.table::as.data.table(getCellsFromPolygon( + gobject = gobject, + polygon_name = polygon_name, + spat_unit = spat_unit, + spat_loc_name = spat_loc_name, + polygons = polygons + )) + data.table::setnames(polygon_cells, old = "poly_ID", new = polygon_name) + + ## get original cell metadata + cell_metadata <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + + + ## add polygon ID to cell metadata + polygon_cells <- polygon_cells[, c("cell_ID", polygon_name), with = FALSE] + new_cell_metadata <- data.table::merge.data.table( + x = cell_metadata, + y = polygon_cells, + by = "cell_ID", all.x = TRUE + ) + + ## assign a default ID to cells outside of polygons + selection_values <- new_cell_metadata[[polygon_name]] + selection_values <- ifelse( + is.na(selection_values), na.label, selection_values) + new_cell_metadata[, c(polygon_name) := selection_values] + + ## keep original order of cells + new_cell_metadata <- new_cell_metadata[match( + cell_metadata$cell_ID, + new_cell_metadata$cell_ID + ), ] + + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = new_cell_metadata[, -1] + ) + + return(gobject) } @@ -267,95 +318,101 @@ addPolygonCells <- function(gobject, #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") #' @param selected_feats vector of selected features to plot -#' @param expression_values gene expression values to use ("normalized", "scaled", "custom") -#' @param method method to use to detect differentially expressed feats ("scran", "gini", "mast") +#' @param expression_values gene expression values to use +#' ("normalized", "scaled", "custom") +#' @param method method to use to detect differentially expressed feats +#' ("scran", "gini", "mast") #' @param \dots Arguments passed to \link[ComplexHeatmap]{Heatmap} #' #' @return A ComplexHeatmap::Heatmap object #' @export comparePolygonExpression <- function(gobject, - polygon_name = 'selections', - spat_unit = "cell", - feat_type = "rna", - selected_feats = "top_genes", - expression_values = "normalized", - method = "scran", - ...) { - - # verify gobject - if (!inherits(gobject, "giotto")) { - stop("gobject needs to be a giotto object") - } - - # get expression - my_expression = get_expression_values(gobject, - values = expression_values, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'matrix') - - # get cell_ID and poly_ID from metadata - my_metadata = getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - - my_metadata = my_metadata[,c("cell_ID", polygon_name), with = FALSE] - - if (length(selected_feats) == 1 && selected_feats == "top_genes") { - # find top features - scran_results <- findMarkers_one_vs_all(gobject, - spat_unit = "cell", - feat_type = "rna", - method = method, - expression_values = "normalized", - cluster_column = polygon_name, - min_feats = 10) - - selected_feats <- scran_results[, head(.SD, 2), by = 'cluster']$feats - } - # select features - my_expression <- my_expression[selected_feats,] - - # convert to data frame - my_rownames <- rownames(my_expression) - - # calculate zscore - - my_zscores <- my_expression - - for (gene in my_rownames) { - mean_expression_gene <- mean(my_expression[gene,]) - sd_expression_gene <- stats::sd(my_expression[gene,]) - for (cell in colnames(my_expression)) { - my_zscores[gene, cell] <- (my_expression[gene, cell]-mean_expression_gene)/sd_expression_gene + polygon_name = "selections", + spat_unit = "cell", + feat_type = "rna", + selected_feats = "top_genes", + expression_values = "normalized", + method = "scran", + ...) { + # verify gobject + if (!inherits(gobject, "giotto")) { + stop("gobject needs to be a giotto object") + } + + # get expression + my_expression <- get_expression_values(gobject, + values = expression_values, + spat_unit = spat_unit, + feat_type = feat_type, + output = "matrix" + ) + + # get cell_ID and poly_ID from metadata + my_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + + my_metadata <- my_metadata[, c("cell_ID", polygon_name), with = FALSE] + + if (length(selected_feats) == 1 && selected_feats == "top_genes") { + # find top features + scran_results <- findMarkers_one_vs_all(gobject, + spat_unit = "cell", + feat_type = "rna", + method = method, + expression_values = "normalized", + cluster_column = polygon_name, + min_feats = 10 + ) + + selected_feats <- scran_results[, head(.SD, 2), by = "cluster"]$feats } + # select features + my_expression <- my_expression[selected_feats, ] + + # convert to data frame + my_rownames <- rownames(my_expression) + + # calculate zscore - } - - # calculate mean zscore per polygon - my_zscores_mean <- data.table::data.table(feat_ID = my_rownames) - - for(i in unique(my_metadata[[polygon_name]])) { - my_cells <- my_metadata[my_metadata[[polygon_name]] == i, "cell_ID" ] - my_sub_zscores <- my_zscores[,my_cells$cell_ID] - mean_zscores <- Matrix::rowMeans(my_sub_zscores) - my_zscores_mean <- cbind(my_zscores_mean, mean_zscores) - } - - my_zscores_mean <- as.matrix(my_zscores_mean[,-1]) - colnames(my_zscores_mean) <- unique(my_metadata[[polygon_name]]) - rownames(my_zscores_mean) <- my_rownames - - # plot heatmap - my_heatmap <- ComplexHeatmap::Heatmap(my_zscores_mean, - heatmap_legend_param = list(title = "Normalized mean z score"), - cluster_rows = FALSE, - cluster_columns = FALSE, - column_order = mixedsort(colnames(my_zscores_mean)), - ...) - return(my_heatmap) + my_zscores <- my_expression + + for (gene in my_rownames) { + mean_expression_gene <- mean(my_expression[gene, ]) + sd_expression_gene <- stats::sd(my_expression[gene, ]) + for (cell in colnames(my_expression)) { + my_zscores[gene, cell] <- ( + my_expression[gene, cell] - mean_expression_gene) / + sd_expression_gene + } + } + + # calculate mean zscore per polygon + my_zscores_mean <- data.table::data.table(feat_ID = my_rownames) + + for (i in unique(my_metadata[[polygon_name]])) { + my_cells <- my_metadata[my_metadata[[polygon_name]] == i, "cell_ID"] + my_sub_zscores <- my_zscores[, my_cells$cell_ID] + mean_zscores <- Matrix::rowMeans(my_sub_zscores) + my_zscores_mean <- cbind(my_zscores_mean, mean_zscores) + } + + my_zscores_mean <- as.matrix(my_zscores_mean[, -1]) + colnames(my_zscores_mean) <- unique(my_metadata[[polygon_name]]) + rownames(my_zscores_mean) <- my_rownames + + # plot heatmap + my_heatmap <- ComplexHeatmap::Heatmap(my_zscores_mean, + heatmap_legend_param = list(title = "Normalized mean z score"), + cluster_rows = FALSE, + cluster_columns = FALSE, + column_order = mixedsort(colnames(my_zscores_mean)), + ... + ) + return(my_heatmap) } #' Compare cell types percent per polygon @@ -370,44 +427,46 @@ comparePolygonExpression <- function(gobject, #' @return A ComplexHeatmap::Heatmap #' @export compareCellAbundance <- function(gobject, - polygon_name = 'selections', - spat_unit = "cell", - feat_type = "rna", - cell_type_column = "leiden_clus", - ...) { - - # verify gobject - if (!inherits(gobject, "giotto")) { - stop("gobject needs to be a giotto object") - } - - # get poly_ID and cell_type from metadata - my_metadata <- getCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - columns_to_select = c(polygon_name, cell_type_column) - my_metadata <- my_metadata[, columns_to_select, with = FALSE] - - # count cell_type per polygon - my_cell_counts <- table(my_metadata) - - my_cell_percent <- 100*my_cell_counts/rowSums(my_cell_counts) - - # convert to matrix - my_matrix <- Matrix::as.matrix(my_cell_percent) - - rownames(my_matrix) <- rownames(my_cell_percent) - colnames(my_matrix) <- colnames(my_cell_percent) - - # plot heatmap - my_heatmap <- ComplexHeatmap::Heatmap(t_flex(my_matrix), - heatmap_legend_param = list(title = "Cell type percent\nper polygon"), - cluster_rows = FALSE, - cluster_columns = FALSE, - ...) - return(my_heatmap) + polygon_name = "selections", + spat_unit = "cell", + feat_type = "rna", + cell_type_column = "leiden_clus", + ...) { + # verify gobject + if (!inherits(gobject, "giotto")) { + stop("gobject needs to be a giotto object") + } + + # get poly_ID and cell_type from metadata + my_metadata <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + columns_to_select <- c(polygon_name, cell_type_column) + my_metadata <- my_metadata[, columns_to_select, with = FALSE] + + # count cell_type per polygon + my_cell_counts <- table(my_metadata) + + my_cell_percent <- 100 * my_cell_counts / rowSums(my_cell_counts) + + # convert to matrix + my_matrix <- Matrix::as.matrix(my_cell_percent) + + rownames(my_matrix) <- rownames(my_cell_percent) + colnames(my_matrix) <- colnames(my_cell_percent) + + # plot heatmap + my_heatmap <- ComplexHeatmap::Heatmap(t_flex(my_matrix), + heatmap_legend_param = list(title = "Cell type percent\nper polygon"), + cluster_rows = FALSE, + cluster_columns = FALSE, + ... + ) + return(my_heatmap) } @@ -417,60 +476,68 @@ compareCellAbundance <- function(gobject, #' @param polygon_name name of polygon selections #' @param x A ggplot2, spatPlot or terra::rast object #' @param spat_unit spatial unit -#' @param polygons character. Vector of polygon names to plot. If NULL, all polygons are plotted -#' @param ... Additional parameters passed to ggplot2::geom_polygon() or graphics::polygon +#' @param polygons character. Vector of polygon names to plot. If NULL, all +#' polygons are plotted +#' @param ... Additional parameters passed to ggplot2::geom_polygon() or +#' graphics::polygon #' #' @return A ggplot2 image #' @export #' plotPolygons <- function(gobject, - polygon_name = 'selections', - x, - spat_unit = "cell", - polygons = NULL, - ...) { - - ## verify gobject - if (!inherits(gobject, "giotto")) { - stop("gobject must be a Giotto object") - } - - y = geom = NULL - - ## verify plot exists - if(is.null(x)) stop('A plot object must be provided') - - ## get polygons spatial info - polygon_spatVector <- get_polygon_info(gobject = gobject, - polygon_name = polygon_name, - return_giottoPolygon = FALSE) - - coordinates <- terra::geom(polygon_spatVector, df = TRUE) - - if(!is.null(polygons)) { - ## replace polygon names - for (i in seq_along(unlist(polygon_spatVector[["poly_ID"]])) ) { - coordinates$geom <- replace(coordinates$geom, - coordinates$geom == i, - unlist(polygon_spatVector[["poly_ID"]])[i]) + polygon_name = "selections", + x, + spat_unit = "cell", + polygons = NULL, + ...) { + ## verify gobject + if (!inherits(gobject, "giotto")) { + stop("gobject must be a Giotto object") } - coordinates <- coordinates[coordinates$geom %in% polygons,] - } - - ## plot over ggplot or spatPlot - if(inherits(x, "ggplot")) { - x + - geom_polygon(data = coordinates, - aes(x, y, colour = factor(geom), group = geom), - alpha = 0, show.legend = FALSE, ...) + - theme(legend.position = 'none') - } else { - terra::plot(x) - lapply(split(coordinates, by = "name"), - function (x) graphics::polygon(x$x, x$y, ...) ) - } + y <- geom <- NULL + + ## verify plot exists + if (is.null(x)) stop("A plot object must be provided") + + ## get polygons spatial info + polygon_spatVector <- get_polygon_info( + gobject = gobject, + polygon_name = polygon_name, + return_giottoPolygon = FALSE + ) + + coordinates <- terra::geom(polygon_spatVector, df = TRUE) + + if (!is.null(polygons)) { + ## replace polygon names + for (i in seq_along(unlist(polygon_spatVector[["poly_ID"]]))) { + coordinates$geom <- replace( + coordinates$geom, + coordinates$geom == i, + unlist(polygon_spatVector[["poly_ID"]])[i] + ) + } + coordinates <- coordinates[coordinates$geom %in% polygons, ] + } + + ## plot over ggplot or spatPlot + if (inherits(x, "ggplot")) { + x + + geom_polygon( + data = coordinates, + aes(x, y, colour = factor(geom), group = geom), + alpha = 0, show.legend = FALSE, ... + ) + + theme(legend.position = "none") + } else { + terra::plot(x) + lapply( + split(coordinates, by = "name"), + function(x) graphics::polygon(x$x, x$y, ...) + ) + } } # ** 3D interactive plotting #### @@ -480,107 +547,117 @@ plotPolygons <- function(gobject, #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. "cell") #' @param feat_type feature type (e.g. "rna", "dna", "protein") -#' @param cell_color character. What to color cells by (e.g. metadata col or spatial enrichment col) -#' @param cell_color_code character. discrete colors to use. Palette to use or named vector of colors +#' @param cell_color character. What to color cells by +#' (e.g. metadata col or spatial enrichment col) +#' @param cell_color_code character. discrete colors to use. Palette to use or +#' named vector of colors #' @param point_size size of point (cell) #' @param width plot width #' @param height plot height #' -#' @return data.table with selected cell_IDs, spatial coordinates, and cluster_ID. +#' @return data.table with selected cell_IDs, spatial coordinates, and +#' cluster_ID. #' @export #' -plotInteractive3D <- function(gobject, spat_unit = 'cell', feat_type = 'rna', - cell_color = 'leiden_clus', - cell_color_code = NULL, point_size = 0.5, - width = "100%", height = "400px") { - - # NSE vars - sdimx = sdimy = sdimz = cell_ID = NULL - - cell_metadata_table <- pDataDT(gobject, - spat_unit = spat_unit, - feat_type = feat_type) - spatial_coordinates <- getSpatialLocations(gobject, - spat_unit = spat_unit, - output = 'data.table') - - data <- merge(cell_metadata_table, spatial_coordinates) - extent <- c(xmin = min(data$sdimx), xmax = max(data$sdimx), - ymin = min(data$sdimy), ymax = max(data$sdimy), - zmin = min(data$sdimz), zmax = max(data$sdimz)) - sorted_colors <- mixedsort(unique(data[[cell_color]])) - - ui <- miniUI::miniPage( - miniUI::gadgetTitleBar("Slide to select axis range"), - miniUI::miniContentPanel( - shiny::fluidRow( - shiny::column(4, offset = 1, - # Move the slide bar to select z-axis ranges - shiny::sliderInput("xrange", label = "x-axis", - min = extent[["xmin"]], - max = extent[["xmax"]], - value = c(extent[["xmin"]], extent[["xmax"]])), - shiny::sliderInput("yrange", label = "y-axis", - min = extent[["ymin"]], - max = extent[["ymax"]], - value = c(extent[["ymin"]], extent[["ymax"]])), - shiny::sliderInput("zrange", label = "z-axis", - min = extent[["zmin"]], - max = extent[["zmax"]], - value = c(extent[["zmin"]], extent[["zmax"]])) - ), - - shiny::column(4, offset = 2, - shiny::checkboxGroupInput("clusters", - label = "clusters", - choices = sorted_colors, - selected = sorted_colors), - ) - ), - - # Plot - plotly::plotlyOutput("plot1", width = width, height = height) - ), - ) - - server <- function(input, output, session) { - - selected_data <- shiny::reactive({ - data[data[[cell_color]] %in% input$clusters, ] %>% - plotly::filter(sdimx >= input$xrange[1] & sdimx <= input$xrange[2] & - sdimy >= input$yrange[1] & sdimy <= input$yrange[2] & - sdimz >= input$zrange[1] & sdimz <= input$zrange[2]) %>% - plotly::select(cell_ID, sdimx, sdimy, sdimz, cell_color) - }) - - # Render the plot - output$plot1 <- plotly::renderPlotly({ - - selected_data <- selected_data() - - # Plot the data - plotly::plot_ly(selected_data, - x = ~sdimx, - y = ~sdimy, - z = ~sdimz, - color = as.factor(selected_data[[5]]), - colors = cell_color_code, - size = point_size) - }) +plotInteractive3D <- function(gobject, spat_unit = "cell", feat_type = "rna", + cell_color = "leiden_clus", + cell_color_code = NULL, point_size = 0.5, + width = "100%", height = "400px") { + # NSE vars + sdimx <- sdimy <- sdimz <- cell_ID <- NULL + + cell_metadata_table <- pDataDT(gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + spatial_coordinates <- getSpatialLocations(gobject, + spat_unit = spat_unit, + output = "data.table" + ) - #Handle the Done button being pressed. + data <- merge(cell_metadata_table, spatial_coordinates) + extent <- c( + xmin = min(data$sdimx), xmax = max(data$sdimx), + ymin = min(data$sdimy), ymax = max(data$sdimy), + zmin = min(data$sdimz), zmax = max(data$sdimz) + ) + sorted_colors <- mixedsort(unique(data[[cell_color]])) - output$info <- shiny::renderTable(selected_data()) + ui <- miniUI::miniPage( + miniUI::gadgetTitleBar("Slide to select axis range"), + miniUI::miniContentPanel( + shiny::fluidRow( + shiny::column(4, + offset = 1, + # Move the slide bar to select z-axis ranges + shiny::sliderInput("xrange", + label = "x-axis", + min = extent[["xmin"]], + max = extent[["xmax"]], + value = c(extent[["xmin"]], extent[["xmax"]]) + ), + shiny::sliderInput("yrange", + label = "y-axis", + min = extent[["ymin"]], + max = extent[["ymax"]], + value = c(extent[["ymin"]], extent[["ymax"]]) + ), + shiny::sliderInput("zrange", + label = "z-axis", + min = extent[["zmin"]], + max = extent[["zmax"]], + value = c(extent[["zmin"]], extent[["zmax"]]) + ) + ), + shiny::column(4, + offset = 2, + shiny::checkboxGroupInput("clusters", + label = "clusters", + choices = sorted_colors, + selected = sorted_colors + ), + ) + ), + + # Plot + plotly::plotlyOutput("plot1", width = width, height = height) + ), + ) - shiny::observeEvent(input$done, { - returnValue <- selected_data() - shiny::stopApp(returnValue) - }) - } + server <- function(input, output, session) { + selected_data <- shiny::reactive({ + data[data[[cell_color]] %in% input$clusters, ] %>% + plotly::filter( + sdimx >= input$xrange[1] & sdimx <= input$xrange[2] & + sdimy >= input$yrange[1] & sdimy <= input$yrange[2] & + sdimz >= input$zrange[1] & sdimz <= input$zrange[2]) %>% + plotly::select(cell_ID, sdimx, sdimy, sdimz, cell_color) + }) + + # Render the plot + output$plot1 <- plotly::renderPlotly({ + selected_data <- selected_data() + + # Plot the data + plotly::plot_ly(selected_data, + x = ~sdimx, + y = ~sdimy, + z = ~sdimz, + color = as.factor(selected_data[[5]]), + colors = cell_color_code, + size = point_size + ) + }) + + # Handle the Done button being pressed. + + output$info <- shiny::renderTable(selected_data()) + + shiny::observeEvent(input$done, { + returnValue <- selected_data() + shiny::stopApp(returnValue) + }) + } - shiny::runGadget(ui, server) + shiny::runGadget(ui, server) } - - - - diff --git a/R/kriging.R b/R/kriging.R index 4ab332809..681419cd6 100644 --- a/R/kriging.R +++ b/R/kriging.R @@ -1,4 +1,3 @@ - # interpolateFeature #### setGeneric( "interpolateFeature", @@ -44,22 +43,19 @@ NULL #' @export setMethod( "interpolateFeature", signature(x = "giotto", y = "missing"), - function( - x, - spat_unit = NULL, - feat_type = NULL, - feats, - spatvalues_params = list(), - spat_loc_name = "raw", - ext = NULL, - buffer = 50, - name_fmt = "%s", - savedir = file.path(getwd(), "interp_rasters"), - overwrite = FALSE, - verbose = NULL, - ... - ) { - + function(x, + spat_unit = NULL, + feat_type = NULL, + feats, + spatvalues_params = list(), + spat_loc_name = "raw", + ext = NULL, + buffer = 50, + name_fmt = "%s", + savedir = file.path(getwd(), "interp_rasters"), + overwrite = FALSE, + verbose = NULL, + ...) { # This method prepares the data from the giotto object to pass # downstream where the actual interpolation happens @@ -120,7 +116,7 @@ setMethod( # Note that this object will need to be reconnected. interp_img_list <- do.call(interpolateFeature, args = a) - for(i in seq_along(interp_img_list)) { + for (i in seq_along(interp_img_list)) { x <- setGiotto( gobject = x, x = interp_img_list[[i]] @@ -146,17 +142,15 @@ setMethod( setMethod( "interpolateFeature", signature(x = "spatLocsObj", y = "data.frame"), - function( - x, y, - ext = NULL, - buffer = 50, - rastersize = 500, - name_fmt = "%s", - savedir = file.path(getwd(), "interp_rasters"), - overwrite = FALSE, - # cores = GiottoUtils::determine_cores(), - ... - ) { + function(x, y, + ext = NULL, + buffer = 50, + rastersize = 500, + name_fmt = "%s", + savedir = file.path(getwd(), "interp_rasters"), + overwrite = FALSE, + # cores = GiottoUtils::determine_cores(), + ...) { checkmate::assert_character(savedir) checkmate::assert_character(name_fmt) checkmate::assert_logical(overwrite) @@ -193,7 +187,6 @@ setMethod( interp_img_list <- lapply_flex( feats, function(feat) { - name <- sprintf(name_fmt, feat) filename <- file.path(savedir, paste0(name, ".tif")) @@ -201,7 +194,7 @@ setMethod( model <- gstat::gstat( id = feat, formula = as.formula(paste(feat, "~ 1")), - locations = ~sdimx + sdimy, + locations = ~ sdimx + sdimy, data = annotatedlocs, nmax = 7, set = list( @@ -267,31 +260,3 @@ setMethod( return(interp_img_list) } ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/R/poly_influence.R b/R/poly_influence.R index db4ea5fb2..e8f210254 100644 --- a/R/poly_influence.R +++ b/R/poly_influence.R @@ -1,11 +1,12 @@ - #' @title showPolygonSizeInfluence #' @name showPolygonSizeInfluence #' @param gobject giotto object #' @param spat_unit spatial unit -#' @param alt_spat_unit alternaitve spatial unit which represents resized polygon data +#' @param alt_spat_unit alternaitve spatial unit which represents resized +#' polygon data #' @param feat_type feature type -#' @param clus_name name of cluster column in cell_metadata for given spat_unit and alt_spat_unit, i.e. "kmeans" +#' @param clus_name name of cluster column in cell_metadata for given spat_unit +#' and alt_spat_unit, i.e. "kmeans" #' @param return_plot logical. whether to return the plot object #' @param verbose be verbose #' @return giotto object with altered cell_metadata @@ -15,115 +16,145 @@ #' New columns, resize_switch and cluster_interaction, will be created within #' cell_metadata for spat_unit-feat_type. #' -#' These new columns will describe if a given cell switched cluster number when resized. +#' These new columns will describe if a given cell switched cluster number when +#' resized. #' If the same amount of clusters exist for spat_unit-feat_type and #' alt_spat_unit-feat_type, then clusters are determined to be #' corresponding based on % overlap in cell_IDs in each cluster. #' -#' Otherwise, multiple clusters from the spatial unit feature type pair are condensed -#' to align with the smaller number of clusters and ensure overlap. +#' Otherwise, multiple clusters from the spatial unit feature type pair are +#' condensed to align with the smaller number of clusters and ensure overlap. #' #' @export showPolygonSizeInfluence <- function(gobject = NULL, - spat_unit = NULL, - alt_spat_unit = NULL, - feat_type = NULL, - clus_name = "kmeans", - return_plot = FALSE, - verbose = FALSE){ - # NSE vars - cell_ID = total_expr = cluster_interactions = N = resize_switch = NULL - - # Guards - if(!c("giotto") %in% class(gobject)) stop(wrap_txt("Please provide a valid Giotto Object.", errWidth=TRUE)) - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - if (!alt_spat_unit %in% names(gobject@expression)){ - stop(wrap_txt(paste0("Alternative spatial unit ", alt_spat_unit, " not found. Please ensure it exists."), errWidth = T)) - } - - meta_cols = names(getCellMetadata(gobject, spat_unit = spat_unit, feat_type = feat_type, output = "data.table")) - - if (!clus_name %in% meta_cols){ - stop(wrap_txt(paste0("Cluster name ",clus_name, " not found within cell metadata. Please ensure it exists."), errWidth = T)) - } - - - if (c("cluster_interactions") %in% meta_cols){ - warning((wrap_txt(paste0("Switch interactions already found within cell_metadata for - spat_unit feat_type pair:`", spat_unit,"-", feat_type, "`. They will be overwritten."), errWidth = T))) - } - ## Compare clustering results between cell and smallcell data ####### - # ----------------------------------------------------------------- # - cell_meta = pDataDT(gobject, spat_unit = spat_unit) - cell_meta_new = pDataDT(gobject, spat_unit = alt_spat_unit) - - new_clus_table = cell_meta_new[,.(cell_ID, total_expr)] - new_clus_table[[clus_name]] = cell_meta_new[[clus_name]] - - cell_meta = merge.data.table(cell_meta, new_clus_table, by = 'cell_ID') - - cell_meta[, cluster_interactions := paste0(cell_meta[[paste0(clus_name,'.x')]],'-',cell_meta[[paste0(clus_name,'.y')]])] - switches2 = cell_meta[, .N, by = 'cluster_interactions'] - setorder(switches2, N) - - num_orig = sort(unique(cell_meta[[paste0(clus_name,'.x')]])) - num_new = sort(unique(cell_meta[[paste0(clus_name,'.y')]])) - - equal_len = TRUE - if(length(num_orig) != length(num_new)) equal_len = FALSE - - switch_strs = c() #scope - if(!equal_len){ - ##### - switch_strs = .determine_switch_string_unequal(num_orig = num_orig, - num_new = num_new) - ##### - }else { - cmeta = pDataDT(gobject, spat_unit = spat_unit) - cmeta_new = pDataDT(gobject, spat_unit = alt_spat_unit) - - switch_strs = .determine_switch_string_equal(cell_meta = cmeta, - cell_meta_new = cmeta_new, - clus_name = clus_name) - } - - cell_meta[, resize_switch := ifelse(cluster_interactions %in% switch_strs, 'same', 'switch')] - gobject = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = cell_meta[,.(cell_ID, resize_switch, cluster_interactions)], - by_column = TRUE, - column_cell_ID = 'cell_ID') - - poly_plot = spatInSituPlotPoints(gobject = gobject, - spat_unit = spat_unit, - polygon_feat_type = spat_unit, - show_polygon = T, - feat_type = feat_type, - feats = NULL, - polygon_fill = 'resize_switch', - polygon_fill_as_factor = TRUE, - polygon_line_size = 0.1, - polygon_color = 'white', - coord_fix_ratio = 1, - polygon_fill_code = c(switch = 'red', same = 'gray'), - return_plot = return_plot) - - num_cells_switched = sum(getCellMetadata(gobject)$resize_switch == 'switch') - num_cells_same = sum(getCellMetadata(gobject)$resize_switch == 'same') - if(verbose) print(paste0(num_cells_switched, " cells switched clusters.")) - if(verbose) print(paste0(num_cells_same, " cells remained in the same cluster.")) - - if (return_plot) return(poly_plot) - - return (gobject) + spat_unit = NULL, + alt_spat_unit = NULL, + feat_type = NULL, + clus_name = "kmeans", + return_plot = FALSE, + verbose = FALSE) { + # NSE vars + cell_ID <- total_expr <- cluster_interactions <- N <- resize_switch <- NULL + + # Guards + if (!c("giotto") %in% class(gobject)) + stop(wrap_txt("Please provide a valid Giotto Object.", errWidth = TRUE)) + + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + if (!alt_spat_unit %in% names(gobject@expression)) { + stop(wrap_txt(paste0( + "Alternative spatial unit ", alt_spat_unit, + " not found. Please ensure it exists."), errWidth = TRUE)) + } + + meta_cols <- names(getCellMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table")) + + if (!clus_name %in% meta_cols) { + stop(wrap_txt(paste0( + "Cluster name ", clus_name, + " not found within cell metadata. Please ensure it exists."), + errWidth = TRUE)) + } + + if (c("cluster_interactions") %in% meta_cols) { + warning((wrap_txt(paste0("Switch interactions already found within + cell_metadata for spat_unit feat_type pair:`", spat_unit, "-", + feat_type, "`. They will be overwritten."), errWidth = TRUE))) + } + ## Compare clustering results between cell and smallcell data ####### + # ----------------------------------------------------------------- # + cell_meta <- pDataDT(gobject, spat_unit = spat_unit) + cell_meta_new <- pDataDT(gobject, spat_unit = alt_spat_unit) + + new_clus_table <- cell_meta_new[, .(cell_ID, total_expr)] + new_clus_table[[clus_name]] <- cell_meta_new[[clus_name]] + + cell_meta <- merge.data.table(cell_meta, new_clus_table, by = "cell_ID") + + cell_meta[, cluster_interactions := paste0(cell_meta[[ + paste0(clus_name, ".x")]], "-", cell_meta[[paste0(clus_name, ".y")]])] + switches2 <- cell_meta[, .N, by = "cluster_interactions"] + setorder(switches2, N) + + num_orig <- sort(unique(cell_meta[[paste0(clus_name, ".x")]])) + num_new <- sort(unique(cell_meta[[paste0(clus_name, ".y")]])) + + equal_len <- TRUE + if (length(num_orig) != length(num_new)) equal_len <- FALSE + + switch_strs <- c() # scope + if (!equal_len) { + ##### + switch_strs <- .determine_switch_string_unequal( + num_orig = num_orig, + num_new = num_new + ) + ##### + } else { + cmeta <- pDataDT(gobject, spat_unit = spat_unit) + cmeta_new <- pDataDT(gobject, spat_unit = alt_spat_unit) + + switch_strs <- .determine_switch_string_equal( + cell_meta = cmeta, + cell_meta_new = cmeta_new, + clus_name = clus_name + ) + } + + cell_meta[, resize_switch := ifelse( + cluster_interactions %in% switch_strs, "same", "switch")] + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = cell_meta[ + , .(cell_ID, resize_switch, cluster_interactions)], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + + poly_plot <- spatInSituPlotPoints( + gobject = gobject, + spat_unit = spat_unit, + polygon_feat_type = spat_unit, + show_polygon = TRUE, + feat_type = feat_type, + feats = NULL, + polygon_fill = "resize_switch", + polygon_fill_as_factor = TRUE, + polygon_line_size = 0.1, + polygon_color = "white", + coord_fix_ratio = 1, + polygon_fill_code = c(switch = "red", same = "gray"), + return_plot = return_plot + ) + + num_cells_switched <- sum( + getCellMetadata(gobject)$resize_switch == "switch") + num_cells_same <- sum(getCellMetadata(gobject)$resize_switch == "same") + if (verbose) print(paste0(num_cells_switched, " cells switched clusters.")) + if (verbose) + print(paste0(num_cells_same, " cells remained in the same cluster.")) + + if (return_plot) { + return(poly_plot) + } + + return(gobject) } #' @title Determine switch string equal @@ -138,44 +169,42 @@ showPolygonSizeInfluence <- function(gobject = NULL, #' y_m is a cluster number from the resized spatial unit #' n is the number of clusters #' -#' Clusters are determined to be corresponding based on % overlap in cell_IDs in each cluster. +#' Clusters are determined to be corresponding based on % overlap in cell_IDs +#' in each cluster. #' #' @keywords internal .determine_switch_string_equal <- function(cell_meta = NULL, - cell_meta_new = NULL, - clus_name = NULL){ - k_clusters = sort(unique(cell_meta[[clus_name]])) - num_clusters = k_clusters[length(k_clusters)] - - k_match_clusters = 1:num_clusters - switch_strs = c() - for (i in 1:num_clusters){ - thresh = 0 - clus_match = NULL - for (j in 1:num_clusters){ - - c_df = cell_meta[cell_meta[[clus_name]] == i]$cell_ID - nc_df = cell_meta_new[cell_meta_new[[clus_name]] == j]$cell_ID - - overlap = sum(c_df %in% nc_df/length(c_df)) - if (overlap > thresh){ - thresh = overlap - clus_match = j - } + cell_meta_new = NULL, + clus_name = NULL) { + k_clusters <- sort(unique(cell_meta[[clus_name]])) + num_clusters <- k_clusters[length(k_clusters)] + + k_match_clusters <- 1:num_clusters + switch_strs <- c() + for (i in 1:num_clusters) { + thresh <- 0 + clus_match <- NULL + for (j in 1:num_clusters) { + c_df <- cell_meta[cell_meta[[clus_name]] == i]$cell_ID + nc_df <- cell_meta_new[cell_meta_new[[clus_name]] == j]$cell_ID + + overlap <- sum(c_df %in% nc_df / length(c_df)) + if (overlap > thresh) { + thresh <- overlap + clus_match <- j + } + } + + k_match_clusters[i] <- clus_match } - k_match_clusters[i] = clus_match - - } - - for (idx in 1:num_clusters) { - p1 = k_clusters[[idx]] - p2 = k_match_clusters[[idx]] - switch_strs = c(switch_strs, paste0(p1,"-",p2)) - } - - return(switch_strs) + for (idx in 1:num_clusters) { + p1 <- k_clusters[[idx]] + p2 <- k_match_clusters[[idx]] + switch_strs <- c(switch_strs, paste0(p1, "-", p2)) + } + return(switch_strs) } #' @title Determine switch string unequal @@ -183,7 +212,8 @@ showPolygonSizeInfluence <- function(gobject = NULL, #' @param num_orig sorted vector of cluster numbers in the original metadata #' @param num_new sorted vector of cluster numbers in the new, resized metadata #' @return switch_str, a vector of corresponding cluster numbers in strings -#' @details determines how to create a string in the format c("x_1-y_1", "x_2-y_2"..."x_n, y_m") +#' @details determines how to create a string in the format +#' c("x_1-y_1", "x_2-y_2"..."x_n, y_m") #' Where: #' x_n is a cluster number from the original spatial unit #' y_m is a cluster number from the resized spatial unit @@ -194,27 +224,29 @@ showPolygonSizeInfluence <- function(gobject = NULL, #' #' @keywords internal .determine_switch_string_unequal <- function(num_orig = NULL, - num_new = NULL){ - - switch_strs = c() + num_new = NULL) { + switch_strs <- c() - orig_first = TRUE + orig_first <- TRUE - if(length(num_orig) < length(num_new)) orig_first = FALSE + if (length(num_orig) < length(num_new)) orig_first <- FALSE - if(orig_first){ - switch_strs = .create_switch_string_unequal(num_first = num_orig, - num_second = num_new, - switch_strs = switch_strs) - return(switch_strs) - } - - switch_strs = .create_switch_string_unequal(num_first = num_new, - num_second = num_orig, - switch_strs = switch_strs) + if (orig_first) { + switch_strs <- .create_switch_string_unequal( + num_first = num_orig, + num_second = num_new, + switch_strs = switch_strs + ) + return(switch_strs) + } - return(switch_strs) + switch_strs <- .create_switch_string_unequal( + num_first = num_new, + num_second = num_orig, + switch_strs = switch_strs + ) + return(switch_strs) } #' @title Create switch string unequal @@ -230,19 +262,20 @@ showPolygonSizeInfluence <- function(gobject = NULL, #' m is the number of clusters in the new spatial unit #' @keywords internal .create_switch_string_unequal <- function(num_first = NULL, - num_second = NULL, - switch_strs = NULL){ - for (o in num_first){ - for (n in num_second){ - if(as.integer(o) == as.integer(n)) switch_strs = c(switch_strs, paste0(as.character(o),"-",as.character(n))) - if(o > n && n == num_second[length(num_second)]) switch_strs = c(switch_strs, paste0(as.character(o),"-",as.character(n))) + num_second = NULL, + switch_strs = NULL) { + for (o in num_first) { + for (n in num_second) { + if (as.integer(o) == as.integer(n)) switch_strs <- c( + switch_strs, paste0(as.character(o), "-", as.character(n))) + if (o > n && n == num_second[length(num_second)]) switch_strs <- c( + switch_strs, paste0(as.character(o), "-", as.character(n))) + } } - } - switch_strs = unique(switch_strs) - - return(switch_strs) + switch_strs <- unique(switch_strs) + return(switch_strs) } #' @title showCellProportionSwitchedPie @@ -250,167 +283,195 @@ showPolygonSizeInfluence <- function(gobject = NULL, #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type -#' @details Creates a pie chart showing how many cells switched clusters after annotation resizing. -#' The function showPolygonSizeInfluence() must have been run on the Giotto Object for this function to run. +#' @details Creates a pie chart showing how many cells switched clusters after +#' annotation resizing. +#' The function showPolygonSizeInfluence() must have been run on the Giotto +#' Object for this function to run. #' @export showCellProportionSwitchedPie <- function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL) { - # NSE vars - cluster_status = num_cells = resize_switch = perc = ypos = NULL - - # Guards - if(!"giotto" %in% class(gobject)) stop(wrap_txt("Please provide a valid Giotto Object.", errWidth=TRUE)) - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - #Extract cell metadata - cmeta = getCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = "data.table") - - if (!c("resize_switch") %in% names(cmeta)){ - stop(wrap_txt("Column 'resize_switch' not found in cell metadata. Ensure showPolygonSizeInfluence() has been run.",errWidth = TRUE)) - } - - plotdf = data.table::data.table() - plotdf[,cluster_status := c("switch", "same")] - plotdf[,num_cells := c( sum(cmeta[,resize_switch == "switch"]), sum(cmeta[,resize_switch == "same"]))] - - per_switch = plotdf$num_cells[[1]] / sum(plotdf$num_cells) * 100 - per_same = plotdf$num_cells[[2]] / sum(plotdf$num_cells) * 100 + spat_unit = NULL, + feat_type = NULL) { + # NSE vars + cluster_status <- num_cells <- resize_switch <- perc <- ypos <- NULL + + # Guards + if (!"giotto" %in% class(gobject)) + stop(wrap_txt("Please provide a valid Giotto Object.", errWidth = TRUE)) + + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + + # Extract cell metadata + cmeta <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table" + ) + + if (!c("resize_switch") %in% names(cmeta)) { + stop(wrap_txt("Column 'resize_switch' not found in cell metadata. + Ensure showPolygonSizeInfluence() has been run.", errWidth = TRUE)) + } - y_switch = cumsum(per_switch) - 0.5 * per_switch - y_same = cumsum(per_same+per_switch) - 0.5 * per_same + plotdf <- data.table::data.table() + plotdf[, cluster_status := c("switch", "same")] + plotdf[, num_cells := c(sum(cmeta[ + , resize_switch == "switch"]), sum(cmeta[, resize_switch == "same"]))] + per_switch <- plotdf$num_cells[[1]] / sum(plotdf$num_cells) * 100 + per_same <- plotdf$num_cells[[2]] / sum(plotdf$num_cells) * 100 - plotdf[,perc := c(per_switch, per_same)] - plotdf[,ypos := c(y_switch, y_same)] + y_switch <- cumsum(per_switch) - 0.5 * per_switch + y_same <- cumsum(per_same + per_switch) - 0.5 * per_same - print(plotdf) - ggplot(as.data.frame(plotdf), aes(x="",y=perc, fill = cluster_status)) + - coord_polar("y", start=0) + geom_bar(stat="identity", width=1) + - theme_void() + - geom_text(aes(y = ypos, label = num_cells), color = "white", size = 6) + plotdf[, perc := c(per_switch, per_same)] + plotdf[, ypos := c(y_switch, y_same)] + print(plotdf) + ggplot( + as.data.frame(plotdf), aes(x = "", y = perc, fill = cluster_status)) + + coord_polar("y", start = 0) + + geom_bar(stat = "identity", width = 1) + + theme_void() + + geom_text(aes(y = ypos, label = num_cells), color = "white", size = 6) } #' @title showCellProportionSwitchedSanKey #' @name showCellProportionSwitchedSanKey -#' @param gobject giotto object which contains metadata for both spat_unit and alt_spat_unit +#' @param gobject giotto object which contains metadata for both spat_unit and +#' alt_spat_unit #' @param spat_unit spatial unit -#' @param alt_spat_unit alternative spatial unit which stores data after resizing annotations +#' @param alt_spat_unit alternative spatial unit which stores data after +#' resizing annotations #' @param feat_type feature type #' @details Creates a Sankey Diagram to illustrate cluster switching behavior. #' Currently only supports displaying cluster switching for kmeans clusters. #' @export showCellProportionSwitchedSanKey <- function(gobject = NULL, - spat_unit = NULL, - alt_spat_unit = NULL, - feat_type = NULL){ - # NSE vars - kmeans_small <- cell_ID <- NULL - - # Guards - if(!"giotto" %in% class(gobject)) stop(wrap_txt("Please provide a valid Giotto Object.", errWidth=TRUE)) - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - if (!alt_spat_unit %in% names(gobject@expression)){ - stop(wrap_txt(paste0("Alternative spatial unit ", alt_spat_unit, " not found. Please ensure it exists."), errWidth = T)) - } - - package_check("networkD3") - - #Extract cell metadata - cmeta = getCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = "data.table") - - if (!c("resize_switch") %in% names(cmeta)){ - stop(wrap_txt("Column 'resize_switch' not found in cell metadata. Ensure showPolygonSizeInfluence() has been run.",errWidth = TRUE)) - } - - if (!c("kmeans") %in% names(cmeta)){ - stop(wrap_txt("This function has only been implemented for k-means clusters.")) - } - + spat_unit = NULL, + alt_spat_unit = NULL, + feat_type = NULL) { + # NSE vars + kmeans_small <- cell_ID <- NULL + + # Guards + if (!"giotto" %in% class(gobject)) + stop(wrap_txt("Please provide a valid Giotto Object.", errWidth = TRUE)) + + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + if (!alt_spat_unit %in% names(gobject@expression)) { + stop(wrap_txt(paste0( + "Alternative spatial unit ", alt_spat_unit, + " not found. Please ensure it exists."), errWidth = TRUE)) + } - small_cmeta = getCellMetadata(gobject = gobject, - spat_unit = alt_spat_unit, - feat_type = feat_type, - output = "data.table") + package_check("networkD3") - if (!c("kmeans") %in% names(small_cmeta)){ - stop(wrap_txt("This function has only been implemented for k-means clusters.")) - } + # Extract cell metadata + cmeta <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table" + ) - small_cmeta_clus = small_cmeta[,.(cell_ID, kmeans)] - small_cmeta_clus$kmeans_small = small_cmeta_clus$kmeans - small_cmeta_clus$kmeans = NULL + if (!c("resize_switch") %in% names(cmeta)) { + stop(wrap_txt("Column 'resize_switch' not found in cell metadata. + Ensure showPolygonSizeInfluence() has been run.", errWidth = TRUE)) + } - merged_cmeta = data.table::merge.data.table(cmeta, small_cmeta_clus, by.x = "cell_ID", by.y = "cell_ID") + if (!c("kmeans") %in% names(cmeta)) { + stop("This function has only been implemented for k-means clusters.") + } - k1 = unique(merged_cmeta$kmeans) - k2 = unique(merged_cmeta$kmeans_small) + small_cmeta <- getCellMetadata( + gobject = gobject, + spat_unit = alt_spat_unit, + feat_type = feat_type, + output = "data.table" + ) - fdt = data.table::data.table() - c_k1 = c() - c_k2 = c() + if (!c("kmeans") %in% names(small_cmeta)) { + stop("This function has only been implemented for k-means clusters.") + } - flen = length(k1) * length(k2) - idx1 = 1 - idx2 = 1 + small_cmeta_clus <- small_cmeta[, .(cell_ID, kmeans)] + small_cmeta_clus$kmeans_small <- small_cmeta_clus$kmeans + small_cmeta_clus$kmeans <- NULL - for (i in 1:flen){ - c_k1[i] = k1[idx1]-1 #java zero-index - c_k2[i] = k2[idx2]-1 #java zero-index + merged_cmeta <- data.table::merge.data.table( + cmeta, small_cmeta_clus, by.x = "cell_ID", by.y = "cell_ID") - if (i%%length(k1) == 0)idx1 = idx1 + 1 - idx2 = idx2 + 1 - if ( idx2 > length(k2)) idx2 = 1 - } + k1 <- unique(merged_cmeta$kmeans) + k2 <- unique(merged_cmeta$kmeans_small) - num_occ = c() + fdt <- data.table::data.table() + c_k1 <- c() + c_k2 <- c() - for ( i in 1:flen){ - num_occ[i] = dim(na.omit(merged_cmeta[kmeans == (c_k1[i]+1)][merged_cmeta[kmeans_small == (c_k2[i]+1)]]))[[1]] - } + flen <- length(k1) * length(k2) + idx1 <- 1 + idx2 <- 1 - fdt[,"k1"] = c_k1 - fdt[,"k2"] = c_k2 + 7 - fdt[,"value"] = num_occ - fdt + for (i in 1:flen) { + c_k1[i] <- k1[idx1] - 1 # java zero-index + c_k2[i] <- k2[idx2] - 1 # java zero-index - label_dt = data.table::data.table() - label_dt[,"name"] = c(paste0("original_",as.character(sort(k1))), paste0("resized_",as.character(sort(k2)))) - label_dt + if (i %% length(k1) == 0) idx1 <- idx1 + 1 + idx2 <- idx2 + 1 + if (idx2 > length(k2)) idx2 <- 1 + } - master = list(fdt, label_dt) - names(master) = c("links", "nodes") + num_occ <- c() - networkD3::sankeyNetwork(Links = master$links, - Nodes = master$nodes, - Source = "k1", - Target = "k2", - Value = "value", - NodeID = "name", - units = "TWh", - fontSize = 12, - nodeWidth = 30) + for (i in 1:flen) { + num_occ[i] <- dim(na.omit(merged_cmeta[kmeans == (c_k1[i] + 1)][ + merged_cmeta[kmeans_small == (c_k2[i] + 1)]]))[[1]] + } + fdt[, "k1"] <- c_k1 + fdt[, "k2"] <- c_k2 + 7 + fdt[, "value"] <- num_occ + fdt + + label_dt <- data.table::data.table() + label_dt[, "name"] <- c(paste0("original_", as.character(sort(k1))), paste0( + "resized_", as.character(sort(k2)))) + label_dt + + master <- list(fdt, label_dt) + names(master) <- c("links", "nodes") + + networkD3::sankeyNetwork( + Links = master$links, + Nodes = master$nodes, + Source = "k1", + Target = "k2", + Value = "value", + NodeID = "name", + units = "TWh", + fontSize = 12, + nodeWidth = 30 + ) } diff --git a/R/python_hmrf.R b/R/python_hmrf.R index 483ff65c4..154077067 100644 --- a/R/python_hmrf.R +++ b/R/python_hmrf.R @@ -1,4 +1,3 @@ - #' @title doHMRF #' @name doHMRF #' @description Run HMRF @@ -9,283 +8,324 @@ #' @param spatial_network_name name of spatial network to use for HMRF #' @param spat_loc_name name of spatial locations #' @param spatial_genes spatial genes to use for HMRF -#' @param spatial_dimensions select spatial dimensions to use, default is all possible dimensions +#' @param spatial_dimensions select spatial dimensions to use, default is all +#' possible dimensions #' @param dim_reduction_to_use use another dimension reduction set as input #' @param dim_reduction_name name of dimension reduction set to use #' @param dimensions_to_use number of dimensions to use as input #' @param name name of HMRF run #' @param k number of HMRF domains -#' @param seed seed to fix random number generator (for creating initialization of HMRF) (-1 if no fixing) -#' @param betas betas to test for. three numbers: start_beta, beta_increment, num_betas e.g. c(0, 2.0, 50) +#' @param seed seed to fix random number generator +#' (for creating initialization of HMRF) (-1 if no fixing) +#' @param betas betas to test for. three numbers: start_beta, beta_increment, +#' num_betas e.g. c(0, 2.0, 50) #' @param tolerance tolerance #' @param zscore zscore #' @param numinit number of initializations #' @param python_path python path to use #' @param output_folder output folder to save results #' @param overwrite_output overwrite output folder -#' @return Creates a directory with results that can be viewed with viewHMRFresults +#' @return Creates a directory with results that can be viewed with +#' viewHMRFresults #' @details Description of HMRF parameters ... #' @export doHMRF <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - spatial_network_name = 'Delaunay_network', - spat_loc_name = 'raw', - spatial_genes = NULL, - spatial_dimensions = c('sdimx', 'sdimy', 'sdimz'), - dim_reduction_to_use = NULL, - dim_reduction_name = 'pca', - dimensions_to_use = 1:10, - seed = 100, - name = 'test', - k = 10, - betas = c(0, 2, 50), - tolerance = 1e-10, - zscore = c('none','rowcol', 'colrow'), - numinit = 100, - python_path = NULL, - output_folder = NULL, - overwrite_output = TRUE) { - - - if(!requireNamespace('smfishHmrf', quietly = TRUE)) { - stop("\n package ", 'smfishHmrf' ," is not yet installed \n", - "To install: \n", - "remotes::install_bitbucket(repo = 'qzhudfci/smfishhmrf-r', ref='master')", - "see http://spatial.rc.fas.harvard.edu/install.html for more information", - call. = FALSE) - } - - - # data.table set global variable - to = from = NULL - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## check or make paths - # python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } - - ## reader.py and get_result.py paths - reader_path = system.file("python", "reader2.py", package = 'Giotto') - - ## output folder - # no folder path specified - if(is.null(output_folder)) { - output_folder = paste0(getwd(),'/','HMRF_output') - if(!file.exists(output_folder)) { - dir.create(path = paste0(getwd(),'/','HMRF_output'), recursive = T) + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + spatial_network_name = "Delaunay_network", + spat_loc_name = "raw", + spatial_genes = NULL, + spatial_dimensions = c("sdimx", "sdimy", "sdimz"), + dim_reduction_to_use = NULL, + dim_reduction_name = "pca", + dimensions_to_use = 1:10, + seed = 100, + name = "test", + k = 10, + betas = c(0, 2, 50), + tolerance = 1e-10, + zscore = c("none", "rowcol", "colrow"), + numinit = 100, + python_path = NULL, + output_folder = NULL, + overwrite_output = TRUE) { + if (!requireNamespace("smfishHmrf", quietly = TRUE)) { + stop("package ", "smfishHmrf", " is not yet installed \n", + "To install: \n", + "remotes::install_bitbucket(repo = 'qzhudfci/smfishhmrf-r', ref='master')", + "see http://spatial.rc.fas.harvard.edu/install.html for more information", + call. = FALSE + ) + } + + + # data.table set global variable + to <- from <- NULL + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + ## check or make paths + # python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") + } + + ## reader.py and get_result.py paths + reader_path <- system.file("python", "reader2.py", package = "Giotto") + + ## output folder + # no folder path specified + if (is.null(output_folder)) { + output_folder <- paste0(getwd(), "/", "HMRF_output") + if (!file.exists(output_folder)) { + dir.create( + path = paste0(getwd(), "/", "HMRF_output"), recursive = TRUE) + } + } + # folder path specified + else if (!is.null(output_folder)) { + output_folder <- path.expand(output_folder) + if (!file.exists(output_folder)) { + dir.create(path = output_folder, recursive = TRUE) + } + } + + + ## first write necessary txt files to output folder ## + # cell location / spatial network / expression data and selected spatial + # genes + + ## 1. expression values + if (!is.null(dim_reduction_to_use)) { + + expr_values <- getDimReduction( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + reduction = "cells", + reduction_method = dim_reduction_to_use, + name = dim_reduction_name, + output = "data.table" + ) + expr_values <- expr_values[, dimensions_to_use] + expr_values <- t_flex(expr_values) + } else { + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) } - } - # folder path specified - else if(!is.null(output_folder)) { - output_folder = path.expand(output_folder) - if(!file.exists(output_folder)) { - dir.create(path = output_folder, recursive = T) + + if (!"matrix" %in% class(expr_values)) { + warning("this matrix will be converted to a dense and memory intensive + base matrix ...") + expr_values <- as.matrix(expr_values) } - } - ## first write necessary txt files to output folder ## - # cell location / spatial network / expression data and selected spatial genes + expression_file <- paste0(output_folder, "/", "expression_matrix.txt") + + # overwrite if exists + if (file.exists(expression_file) & overwrite_output == TRUE) { + message("expression_matrix.txt already exists at this location, will be + overwritten") + data.table::fwrite( + data.table::as.data.table(expr_values, keep.rownames = "gene"), + file = expression_file, quote = FALSE, col.names = TRUE, + row.names = FALSE, sep = " ") + + } else if (file.exists(expression_file) & overwrite_output == FALSE) { + message("expression_matrix.txt already exists at this location, will be + used again") + } else { + data.table::fwrite( + data.table::as.data.table(expr_values, keep.rownames = "gene"), + file = expression_file, quote = FALSE, col.names = TRUE, + row.names = FALSE, sep = " ") + } - ## 1. expression values - if(!is.null(dim_reduction_to_use)) { - #expr_values = gobject@dimension_reduction[['cells']][[dim_reduction_to_use]][[dim_reduction_name]][['coordinates']][, dimensions_to_use] - expr_values = get_dimReduction(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - reduction = 'cells', - reduction_method = dim_reduction_to_use, - name = dim_reduction_name, - output = 'data.table') - expr_values = expr_values[, dimensions_to_use] - expr_values = t_flex(expr_values) - } else { - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - } - if(!'matrix' %in% class(expr_values)) { - warning('this matrix will be converted to a dense and memory intensive base matrix ...') - expr_values = as.matrix(expr_values) - } + ## 2. spatial genes + if (!is.null(dim_reduction_to_use)) { + dimred_rownames <- rownames(expr_values) + spatial_genes_detected <- dimred_rownames[dimensions_to_use] + spatial_genes_detected <- spatial_genes_detected[ + !is.na(spatial_genes_detected)] + } else { + if (is.null(spatial_genes)) { + stop("you need to provide a vector of spatial genes (~500)") + } + spatial_genes_detected <- spatial_genes[ + spatial_genes %in% rownames(expr_values)] + } + spatial_genes_file <- paste0(output_folder, "/", "spatial_genes.txt") + + # overwrite if exists + if (file.exists(spatial_genes_file) & overwrite_output == TRUE) { + message("spatial_genes.txt already exists at this location, will be + overwritten") + write.table(spatial_genes_detected, + file = spatial_genes_file, + quote = FALSE, col.names = FALSE, row.names = FALSE + ) + } else if (file.exists(spatial_genes_file) & overwrite_output == FALSE) { + message("spatial_genes.txt already exists at this location, will be + used again") + } else { + write.table(spatial_genes_detected, + file = spatial_genes_file, + quote = FALSE, col.names = FALSE, row.names = FALSE + ) + } - expression_file = paste0(output_folder,'/', 'expression_matrix.txt') - # overwrite if exists - if(file.exists(expression_file) & overwrite_output == TRUE) { - cat('\n expression_matrix.txt already exists at this location, will be overwritten \n') - data.table::fwrite(data.table::as.data.table(expr_values, keep.rownames="gene"), file=expression_file, quot=F, col.names=T, row.names=F, sep=" ") - #write.table(expr_values, file = expression_file, quote = F, col.names = NA, row.names = T) - } else if(file.exists(expression_file) & overwrite_output == FALSE) { - cat('\n expression_matrix.txt already exists at this location, will be used again \n') - } else { - data.table::fwrite(data.table::as.data.table(expr_values, keep.rownames="gene"), file=expression_file, quot=F, col.names=T, row.names=F, sep=" ") - #write.table(expr_values, - # file = expression_file, - # quote = F, col.names = NA, row.names = T) - } + ## 3. spatial network + spatial_network <- getSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "networkDT" + ) + spatial_network <- spatial_network[, .(to, from)] + spatial_network_file <- paste0(output_folder, "/", "spatial_network.txt") + + if (file.exists(spatial_network_file) & overwrite_output == TRUE) { + message("spatial_network.txt already exists at this location, will be + overwritten") + write.table(spatial_network, + file = spatial_network_file, + row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t" + ) + } else if (file.exists(spatial_network_file) & overwrite_output == FALSE) { + message("spatial_network.txt already exists at this location, will be + used again") + } else { + write.table(spatial_network, + file = spatial_network_file, + row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t" + ) + } + ## 4. cell location + spatial_location <- getSpatialLocations( + gobject = gobject, + spat_unit = spat_unit, + name = spat_loc_name, + output = "data.table", + copy_obj = TRUE + ) - ## 2. spatial genes - if(!is.null(dim_reduction_to_use)) { - dimred_rownames = rownames(expr_values) - spatial_genes_detected = dimred_rownames[dimensions_to_use] - spatial_genes_detected = spatial_genes_detected[!is.na(spatial_genes_detected)] - } else { - if(is.null(spatial_genes)) { - stop('\n you need to provide a vector of spatial genes (~500) \n') + # select spatial dimensions that are available # + spatial_dimensions <- spatial_dimensions[ + spatial_dimensions %in% colnames(spatial_location)] + spatial_location <- spatial_location[ + , c(spatial_dimensions, "cell_ID"), with = FALSE] + spatial_location_file <- paste0( + output_folder, "/", "spatial_cell_locations.txt") + + if (file.exists(spatial_location_file) & overwrite_output == TRUE) { + message("spatial_cell_locations.txt already exists at this location, + will be overwritten") + write.table(spatial_location, + file = spatial_location_file, + row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t" + ) + } else if (file.exists(spatial_location_file)) { + message("spatial_cell_locations.txt already exists at this location, + will be used again") + } else { + write.table(spatial_location, + file = spatial_location_file, + row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t" + ) } - spatial_genes_detected = spatial_genes[spatial_genes %in% rownames(expr_values)] - } - spatial_genes_file = paste0(output_folder,'/', 'spatial_genes.txt') - - # overwrite if exists - if(file.exists(spatial_genes_file) & overwrite_output == TRUE) { - cat('\n spatial_genes.txt already exists at this location, will be overwritten \n') - write.table(spatial_genes_detected, - file = spatial_genes_file, - quote = F, col.names = F, row.names = F) - } else if(file.exists(spatial_genes_file) & overwrite_output == FALSE) { - cat('\n spatial_genes.txt already exists at this location, will be used again \n') - } else { - write.table(spatial_genes_detected, - file = spatial_genes_file, - quote = F, col.names = F, row.names = F) - } - - - - - ## 3. spatial network - spatial_network = get_spatialNetwork(gobject = gobject, - spat_unit = spat_unit, - name = spatial_network_name, - output = 'networkDT') - spatial_network = spatial_network[,.(to,from)] - spatial_network_file = paste0(output_folder,'/', 'spatial_network.txt') - - if(file.exists(spatial_network_file) & overwrite_output == TRUE) { - cat('\n spatial_network.txt already exists at this location, will be overwritten \n') - write.table(spatial_network, - file = spatial_network_file, - row.names = F, col.names = F, quote = F, sep = '\t') - } else if(file.exists(spatial_network_file) & overwrite_output == FALSE) { - cat('\n spatial_network.txt already exists at this location, will be used again \n') - } else { - write.table(spatial_network, - file = spatial_network_file, - row.names = F, col.names = F, quote = F, sep = '\t') - } - - - - - ## 4. cell location - spatial_location = get_spatial_locations(gobject = gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name, - output = 'data.table', - copy_obj = TRUE) - - # select spatial dimensions that are available # - spatial_dimensions = spatial_dimensions[spatial_dimensions %in% colnames(spatial_location)] - spatial_location = spatial_location[, c(spatial_dimensions,'cell_ID'), with = F] - spatial_location_file = paste0(output_folder,'/', 'spatial_cell_locations.txt') - - if(file.exists(spatial_location_file) & overwrite_output == TRUE) { - cat('\n spatial_cell_locations.txt already exists at this location, will be overwritten \n') - write.table(spatial_location, - file = spatial_location_file, - row.names = F, col.names = F, quote = F, sep = '\t') - } else if(file.exists(spatial_location_file)) { - cat('\n spatial_cell_locations.txt already exists at this location, will be used again \n') - } else { - write.table(spatial_location, - file = spatial_location_file, - row.names = F, col.names = F, quote = F, sep = '\t') - } - - - - - # prepare input paths - cell_location = paste0(output_folder,'/','spatial_cell_locations.txt') - spatial_genes = paste0(output_folder,'/','spatial_genes.txt') - spatial_network = paste0(output_folder,'/','spatial_network.txt') - expression_data = paste0(output_folder,'/', 'expression_matrix.txt') - - # create output subfolder for HMRF - output_data = paste0(output_folder,'/', 'result.spatial.zscore') - if(!file.exists(output_data)) dir.create(output_data) - - # encapsulate to avoid path problems - # python code also needs to be updated internally - cell_location = paste0('"', cell_location, '"') - spatial_genes = paste0('"', spatial_genes, '"') - spatial_network = paste0('"', spatial_network, '"') - expression_data = paste0('"', expression_data, '"') - output_data = paste0('"', output_data, '"') - - # process other params - zscore = match.arg(zscore, c('none','rowcol', 'colrow')) - betas_param = c('-b', betas) - betas_final = paste(betas_param, collapse = ' ') - - ## reader part ## - reader_command = paste0(python_path, ' ', reader_path, - ' -l ', cell_location, - ' -g ', spatial_genes, - ' -n ', spatial_network, - ' -e ', expression_data, - ' -o ', output_data, - ' -a ', name, - ' -k ', k, - ' ', betas_final, - ' -t ', tolerance, - ' -z ', zscore, - ' -s ', seed, - ' -i ', numinit) - - print(reader_command) - system(command = reader_command) - - - # store parameter results in HMRF S3 object - HMRFObj = list(name = name, - feat_type = feat_type, - output_data = output_data, - k = k, - betas = betas, - python_path = python_path) - - class(HMRFObj) <- append(class(HMRFObj), 'HMRFoutput') - - - return(HMRFObj) + + + + # prepare input paths + cell_location <- paste0(output_folder, "/", "spatial_cell_locations.txt") + spatial_genes <- paste0(output_folder, "/", "spatial_genes.txt") + spatial_network <- paste0(output_folder, "/", "spatial_network.txt") + expression_data <- paste0(output_folder, "/", "expression_matrix.txt") + + # create output subfolder for HMRF + output_data <- paste0(output_folder, "/", "result.spatial.zscore") + if (!file.exists(output_data)) dir.create(output_data) + + # encapsulate to avoid path problems + # python code also needs to be updated internally + cell_location <- paste0('"', cell_location, '"') + spatial_genes <- paste0('"', spatial_genes, '"') + spatial_network <- paste0('"', spatial_network, '"') + expression_data <- paste0('"', expression_data, '"') + output_data <- paste0('"', output_data, '"') + + # process other params + zscore <- match.arg(zscore, c("none", "rowcol", "colrow")) + betas_param <- c("-b", betas) + betas_final <- paste(betas_param, collapse = " ") + + ## reader part ## + reader_command <- paste0( + python_path, " ", reader_path, + " -l ", cell_location, + " -g ", spatial_genes, + " -n ", spatial_network, + " -e ", expression_data, + " -o ", output_data, + " -a ", name, + " -k ", k, + " ", betas_final, + " -t ", tolerance, + " -z ", zscore, + " -s ", seed, + " -i ", numinit + ) + + print(reader_command) + system(command = reader_command) + + + # store parameter results in HMRF S3 object + HMRFObj <- list( + name = name, + feat_type = feat_type, + output_data = output_data, + k = k, + betas = betas, + python_path = python_path + ) + + class(HMRFObj) <- append(class(HMRFObj), "HMRFoutput") + + + return(HMRFObj) } @@ -301,30 +341,30 @@ doHMRF <- function(gobject, #' @return reloads a previous ran HMRF from doHRMF #' @details Description of HMRF parameters ... #' @export -loadHMRF = function(name_used = 'test', - output_folder_used, - k_used = 10, - betas_used, - python_path_used) { - - output_data = paste0(output_folder_used,'/', 'result.spatial.zscore') - if(!file.exists(output_data)) { - stop('\n doHMRF was not run in this output directory \n') - } - - # check if it indeed exists +loadHMRF <- function(name_used = "test", + output_folder_used, + k_used = 10, + betas_used, + python_path_used) { + output_data <- paste0(output_folder_used, "/", "result.spatial.zscore") + if (!file.exists(output_data)) { + stop("doHMRF was not run in this output directory") + } - HMRFObj = list(name = name_used, - output_data = output_data, - k = k_used, - betas = betas_used, - python_path = python_path_used) + # check if it indeed exists - class(HMRFObj) <- append(class(HMRFObj), 'HMRFoutput') + HMRFObj <- list( + name = name_used, + output_data = output_data, + k = k_used, + betas = betas_used, + python_path = python_path_used + ) + class(HMRFObj) <- append(class(HMRFObj), "HMRFoutput") - return(HMRFObj) + return(HMRFObj) } @@ -342,62 +382,71 @@ loadHMRF = function(name_used = 'test', #' @seealso \code{\link{spatPlot2D}} and \code{\link{spatPlot3D}} #' @export viewHMRFresults <- function(gobject, - HMRFoutput, - k = NULL, - betas_to_view = NULL, - third_dim = FALSE, - ...) { - - - if(!'HMRFoutput' %in% class(HMRFoutput)) { - stop('\n HMRFoutput needs to be output from doHMRFextend \n') - } - - ## reader.py and get_result.py paths - # TODO: part of the package - get_result_path = system.file("python", "get_result2.py", package = 'Giotto') + HMRFoutput, + k = NULL, + betas_to_view = NULL, + third_dim = FALSE, + ...) { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRFextend") + } - # paths and name - name = HMRFoutput$name - output_data = HMRFoutput$output_data - python_path = HMRFoutput$python_path + ## reader.py and get_result.py paths + # TODO: part of the package + get_result_path <- system.file( + "python", "get_result2.py", package = "Giotto") - # k-values - if(is.null(k)) { - stop('\n you need to select a k that was used with doHMRFextend \n') - } - k = HMRFoutput$k + # paths and name + name <- HMRFoutput$name + output_data <- HMRFoutput$output_data + python_path <- HMRFoutput$python_path - # betas - betas = HMRFoutput$betas - possible_betas = seq(betas[1], to = betas[1]+(betas[2]*(betas[3]-1)), by = betas[2]) + # k-values + if (is.null(k)) { + stop("you need to select a k that was used with doHMRFextend") + } + k <- HMRFoutput$k - betas_to_view_detected = betas_to_view[betas_to_view %in% possible_betas] + # betas + betas <- HMRFoutput$betas + possible_betas <- seq(betas[1], to = betas[1] + (betas[2] * (betas[3] - 1)), + by = betas[2]) - # plot betas - for(b in betas_to_view_detected) { + betas_to_view_detected <- betas_to_view[betas_to_view %in% possible_betas] - ## get results part ## - result_command = paste0(python_path, ' ', get_result_path, - ' -r ', output_data, - ' -a ', name, - ' -k ', k, - ' -b ', b) + # plot betas + for (b in betas_to_view_detected) { + ## get results part ## + result_command <- paste0( + python_path, " ", get_result_path, + " -r ", output_data, + " -a ", name, + " -k ", k, + " -b ", b + ) - print(result_command) + print(result_command) - output = system(command = result_command, intern = T) + output <- system(command = result_command, intern = TRUE) - title_name = paste0('k = ', k, ' b = ',b) + title_name <- paste0("k = ", k, " b = ", b) - spatPlot2D(gobject = gobject, cell_color = output, show_plot = T, title = title_name, ...) + spatPlot2D( + gobject = gobject, + cell_color = output, + show_plot = TRUE, + title = title_name, + ...) - if(third_dim == TRUE) { - spatPlot3D(gobject = gobject, cell_color = output, show_plot = T, ...) + if (third_dim == TRUE) { + spatPlot3D( + gobject = gobject, + cell_color = output, + show_plot = TRUE, + ...) + } } - #visPlot(gobject = gobject, sdimz = third_dim, cell_color = output, show_plot = T, title = title_name,...) - } } @@ -413,65 +462,65 @@ viewHMRFresults <- function(gobject, #' @return data.table with HMRF results for each b and the selected k #' @export writeHMRFresults <- function(gobject, - HMRFoutput, - k = NULL, - betas_to_view = NULL, - print_command = F) { - - - if(!'HMRFoutput' %in% class(HMRFoutput)) { - stop('\n HMRFoutput needs to be output from doHMRFextend \n') - } + HMRFoutput, + k = NULL, + betas_to_view = NULL, + print_command = FALSE) { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRFextend") + } - ## reader.py and get_result.py paths - # TODO: part of the package - get_result_path = system.file("python", "get_result2.py", package = 'Giotto') + ## reader.py and get_result.py paths + # TODO: part of the package + get_result_path <- system.file( + "python", "get_result2.py", package = "Giotto") - # paths and name - name = HMRFoutput$name - output_data = HMRFoutput$output_data - python_path = HMRFoutput$python_path + # paths and name + name <- HMRFoutput$name + output_data <- HMRFoutput$output_data + python_path <- HMRFoutput$python_path - # k-values - if(is.null(k)) { - stop('\n you need to select a k that was used with doHMRFextend \n') - } - k = HMRFoutput$k + # k-values + if (is.null(k)) { + stop("you need to select a k that was used with doHMRFextend") + } + k <- HMRFoutput$k - # betas - betas = HMRFoutput$betas - possible_betas = seq(betas[1], to = betas[1]+(betas[2]*(betas[3]-1)), by = betas[2]) + # betas + betas <- HMRFoutput$betas + possible_betas <- seq(betas[1], to = betas[1] + (betas[2] * (betas[3] - 1)), + by = betas[2]) - betas_to_view_detected = betas_to_view[betas_to_view %in% possible_betas] + betas_to_view_detected <- betas_to_view[betas_to_view %in% possible_betas] - result_list = list() + result_list <- list() - # plot betas - for(i in seq_along(betas_to_view_detected)) { + # plot betas + for (i in seq_along(betas_to_view_detected)) { + b <- betas_to_view_detected[i] - b = betas_to_view_detected[i] + ## get results part ## + result_command <- paste0( + python_path, " ", get_result_path, + " -r ", output_data, + " -a ", name, + " -k ", k, + " -b ", b + ) - ## get results part ## - result_command = paste0(python_path, ' ', get_result_path, - ' -r ', output_data, - ' -a ', name, - ' -k ', k, - ' -b ', b) + if (print_command == TRUE) { + print(result_command) + } - if(print_command == TRUE) { - print(result_command) + output <- system(command = result_command, intern = TRUE) + title_name <- paste0("k.", k, ".b.", b) + result_list[[title_name]] <- output } - output = system(command = result_command, intern = T) - title_name = paste0('k.', k, '.b.',b) - result_list[[title_name]] = output - - } - - result_DT = data.table::as.data.table(do.call('cbind', result_list)) - result_DT = cbind(data.table::data.table('cell_ID' = gobject@cell_ID), result_DT) - return(result_DT) - + result_DT <- data.table::as.data.table(do.call("cbind", result_list)) + result_DT <- cbind(data.table::data.table( + "cell_ID" = gobject@cell_ID), result_DT) + return(result_DT) } @@ -490,102 +539,108 @@ writeHMRFresults <- function(gobject, #' @return giotto object #' @export addHMRF <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - HMRFoutput, - k = NULL, - betas_to_add = NULL, - hmrf_name = NULL) { - - - if(!'HMRFoutput' %in% class(HMRFoutput)) { - stop('\n HMRFoutput needs to be output from doHMRFextend \n') - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # get feat_type - feat_type = HMRFoutput$feat_type - - ## reader.py and get_result.py paths - # TODO: part of the package - get_result_path = system.file("python", "get_result2.py", package = 'Giotto') - - # paths and name - name = HMRFoutput$name - output_data = HMRFoutput$output_data - python_path = HMRFoutput$python_path - - # k-values - if(is.null(k)) { - stop('\n you need to select a k that was used with doHMRFextend \n') - } - k = HMRFoutput$k - - # betas - betas = HMRFoutput$betas - possible_betas = seq(betas[1], to = betas[1]+(betas[2]*(betas[3]-1)), by = betas[2]) - - betas_to_add_detected = betas_to_add[betas_to_add %in% possible_betas] - - - # get cell metadata for object - cell_metadata = pDataDT(gobject, feat_type = feat_type) - - - # plot betas - for(b in betas_to_add_detected) { - - ## get results part ## - result_command = paste0(python_path, ' ', get_result_path, - ' -r ', output_data, - ' -a ', name, - ' -k ', k, - ' -b ', b) - print(result_command) - output = system(command = result_command, intern = T) - - # get cell_IDs - cids <- gsub(basename(output_data), "", output_data) %>% - gsub(pattern = "\"", replacement = "") %>% - list.files(pattern = "expression_matrix", - full.names = TRUE) %>% - data.table::fread(nrows = 0L, header = TRUE) %>% - colnames() - cids <- cids[-1] # gene colname is also included - - # create unique name - annot_DT = data.table::data.table( - cell_ID = cids, - temp_name = output + spat_unit = NULL, + feat_type = NULL, + HMRFoutput, + k = NULL, + betas_to_add = NULL, + hmrf_name = NULL) { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRFextend") + } + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - if(!is.null(hmrf_name)) { - annot_name = paste0(hmrf_name,'_k', k, '_b.',b) - setnames(annot_DT, old = 'temp_name', new = annot_name) - } else { - annot_name = paste0('hmrf_k.', k, '_b.',b) - data.table::setnames(annot_DT, old = 'temp_name', new = annot_name) - } + # get feat_type + feat_type <- HMRFoutput$feat_type + ## reader.py and get_result.py paths + # TODO: part of the package + get_result_path <- system.file( + "python", "get_result2.py", package = "Giotto") - gobject = addCellMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - column_cell_ID = 'cell_ID', - new_metadata = annot_DT, - by_column = TRUE) + # paths and name + name <- HMRFoutput$name + output_data <- HMRFoutput$output_data + python_path <- HMRFoutput$python_path + # k-values + if (is.null(k)) { + stop("you need to select a k that was used with doHMRFextend") + } + k <- HMRFoutput$k + + # betas + betas <- HMRFoutput$betas + possible_betas <- seq(betas[1], to = betas[1] + (betas[2] * (betas[3] - 1)), + by = betas[2]) + + betas_to_add_detected <- betas_to_add[betas_to_add %in% possible_betas] + + + # get cell metadata for object + cell_metadata <- pDataDT(gobject, feat_type = feat_type) + + + # plot betas + for (b in betas_to_add_detected) { + ## get results part ## + result_command <- paste0( + python_path, " ", get_result_path, + " -r ", output_data, + " -a ", name, + " -k ", k, + " -b ", b + ) + print(result_command) + output <- system(command = result_command, intern = TRUE) + + # get cell_IDs + cids <- gsub(basename(output_data), "", output_data) %>% + gsub(pattern = "\"", replacement = "") %>% + list.files( + pattern = "expression_matrix", + full.names = TRUE + ) %>% + data.table::fread(nrows = 0L, header = TRUE) %>% + colnames() + cids <- cids[-1] # gene colname is also included + + # create unique name + annot_DT <- data.table::data.table( + cell_ID = cids, + temp_name = output + ) + + if (!is.null(hmrf_name)) { + annot_name <- paste0(hmrf_name, "_k", k, "_b.", b) + setnames(annot_DT, old = "temp_name", new = annot_name) + } else { + annot_name <- paste0("hmrf_k.", k, "_b.", b) + data.table::setnames(annot_DT, old = "temp_name", new = annot_name) + } - } - return(gobject) + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + column_cell_ID = "cell_ID", + new_metadata = annot_DT, + by_column = TRUE + ) + } + return(gobject) } @@ -604,56 +659,63 @@ addHMRF <- function(gobject, #' @seealso \code{\link{spatPlot2D}} #' @export viewHMRFresults2D <- function(gobject, - HMRFoutput, - k = NULL, - betas_to_view = NULL, - ...) { - - - if(!'HMRFoutput' %in% class(HMRFoutput)) { - stop('\n HMRFoutput needs to be output from doHMRFextend \n') - } - - ## reader.py and get_result.py paths - # TODO: part of the package - get_result_path = system.file("python", "get_result2.py", package = 'Giotto') + HMRFoutput, + k = NULL, + betas_to_view = NULL, + ...) { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRFextend") + } - # paths and name - name = HMRFoutput$name - output_data = HMRFoutput$output_data - python_path = HMRFoutput$python_path + ## reader.py and get_result.py paths + # TODO: part of the package + get_result_path <- system.file( + "python", "get_result2.py", package = "Giotto") - # k-values - if(is.null(k)) { - stop('\n you need to select a k that was used with doHMRFextend \n') - } - k = HMRFoutput$k + # paths and name + name <- HMRFoutput$name + output_data <- HMRFoutput$output_data + python_path <- HMRFoutput$python_path - # betas - betas = HMRFoutput$betas - possible_betas = seq(betas[1], to = betas[1]+(betas[2]*(betas[3]-1)), by = betas[2]) + # k-values + if (is.null(k)) { + stop("you need to select a k that was used with doHMRFextend") + } + k <- HMRFoutput$k - betas_to_view_detected = betas_to_view[betas_to_view %in% possible_betas] + # betas + betas <- HMRFoutput$betas + possible_betas <- seq(betas[1], to = betas[1] + (betas[2] * (betas[3] - 1)), + by = betas[2]) - # plot betas - for(b in betas_to_view_detected) { + betas_to_view_detected <- betas_to_view[betas_to_view %in% possible_betas] - ## get results part ## - result_command = paste0(python_path, ' ', get_result_path, - ' -r ', output_data, - ' -a ', name, - ' -k ', k, - ' -b ', b) + # plot betas + for (b in betas_to_view_detected) { + ## get results part ## + result_command <- paste0( + python_path, " ", get_result_path, + " -r ", output_data, + " -a ", name, + " -k ", k, + " -b ", b + ) - print(result_command) + print(result_command) - output = system(command = result_command, intern = T) + output <- system(command = result_command, intern = TRUE) - title_name = paste0('k = ', k, ' b = ',b) + title_name <- paste0("k = ", k, " b = ", b) - spatPlot2D(gobject = gobject, cell_color = as.factor(output), show_plot = T, save_plot = F, title = title_name, ...) - } + spatPlot2D( + gobject = gobject, + cell_color = as.factor(output), + show_plot = TRUE, + save_plot = FALSE, + title = title_name, + ...) + } } @@ -669,61 +731,68 @@ viewHMRFresults2D <- function(gobject, #' @seealso \code{\link{spatPlot3D}} #' @export viewHMRFresults3D <- function(gobject, - HMRFoutput, - k = NULL, - betas_to_view = NULL, - ...) { - - - if(!'HMRFoutput' %in% class(HMRFoutput)) { - stop('\n HMRFoutput needs to be output from doHMRFextend \n') - } - - ## reader.py and get_result.py paths - # TODO: part of the package - get_result_path = system.file("python", "get_result2.py", package = 'Giotto') + HMRFoutput, + k = NULL, + betas_to_view = NULL, + ...) { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRFextend") + } - # paths and name - name = HMRFoutput$name - output_data = HMRFoutput$output_data - python_path = HMRFoutput$python_path + ## reader.py and get_result.py paths + # TODO: part of the package + get_result_path <- system.file( + "python", "get_result2.py", package = "Giotto") - # k-values - if(is.null(k)) { - stop('\n you need to select a k that was used with doHMRFextend \n') - } - k = HMRFoutput$k + # paths and name + name <- HMRFoutput$name + output_data <- HMRFoutput$output_data + python_path <- HMRFoutput$python_path - # betas - betas = HMRFoutput$betas - possible_betas = seq(betas[1], to = betas[1]+(betas[2]*(betas[3]-1)), by = betas[2]) + # k-values + if (is.null(k)) { + stop("you need to select a k that was used with doHMRFextend") + } + k <- HMRFoutput$k - betas_to_view_detected = betas_to_view[betas_to_view %in% possible_betas] + # betas + betas <- HMRFoutput$betas + possible_betas <- seq(betas[1], to = betas[1] + (betas[2] * (betas[3] - 1)), + by = betas[2]) - # plot betas - for(b in betas_to_view_detected) { + betas_to_view_detected <- betas_to_view[betas_to_view %in% possible_betas] - ## get results part ## - result_command = paste0(python_path, ' ', get_result_path, - ' -r ', output_data, - ' -a ', name, - ' -k ', k, - ' -b ', b) + # plot betas + for (b in betas_to_view_detected) { + ## get results part ## + result_command <- paste0( + python_path, " ", get_result_path, + " -r ", output_data, + " -a ", name, + " -k ", k, + " -b ", b + ) - print(result_command) + print(result_command) - output = system(command = result_command, intern = T) + output <- system(command = result_command, intern = TRUE) - title_name = paste0('k = ', k, ' b = ',b) + title_name <- paste0("k = ", k, " b = ", b) - spatPlot3D(gobject = gobject, cell_color = output, show_plot = T, save_plot = F, title = title_name, ...) - } + spatPlot3D( + gobject = gobject, + cell_color = output, + show_plot = TRUE, + save_plot = FALSE, + title = title_name, + ...) + } } -############################################################################################## +################################################################################ ### new HMRF functions V2 #' @title sampling_sp_genes @@ -734,70 +803,75 @@ viewHMRFresults3D <- function(gobject, #' @param target target length of gene list #' @param seed random seed #' @details -#' This function samples a subset of spatial genes among different clusters, with size n = target. -#' Number of samples from each cluster denpends on the relative proportion of each cluster. -#' Changing from equal size by setting sample_rate = 1 to with exact proportion of each cluster by setting sample_rate = +Inf +#' This function samples a subset of spatial genes among different clusters, +#' with size n = target. +#' Number of samples from each cluster denpends on the relative proportion of +#' each cluster. +#' Changing from equal size by setting sample_rate = 1 to with exact proportion +#' of each cluster by setting sample_rate = +Inf #' @keywords internal -sampling_sp_genes = function(clust, - sample_rate=2, - target=500, - seed = 10){ - # clust = spat_cor_netw_DT$cor_clusters$spat_netw_clus - # sample_rate=2 - # target=500 - tot=0 - num_cluster=length(unique(clust)) - gene_list = list() - - for(i in seq(1, num_cluster)){ - gene_list[[i]] = colnames(t(clust[which(clust==i)])) - } - for(i in seq(1, num_cluster)){ - num_g=length(gene_list[[i]]) - tot = tot+num_g/(num_g^(1/sample_rate)) - } - factor=target/tot - num_sample=c() - genes=c() - for(i in seq(1, num_cluster)){ - num_g=length(gene_list[[i]]) - genes[i] = num_g - num_sample[i] = round(num_g/(num_g^(1/sample_rate)) * factor) - } - set.seed(seed) - samples=list() - union_genes = c() - for(i in seq(1, num_cluster)){ - if(length(gene_list[[i]])0] - gx_sorted = gx[order(gx$silhouetteRank.score, decreasing=T),] - }else if(name=="silhouetteRankTest"){ - gx = gx[!is.na(silhouetteRankTest.pval) & silhouetteRankTest.pval<1] - gx_sorted = gx[order(gx$silhouetteRankTest.pval, decreasing=F),] - } - - #print(gx_sorted) - if(method=="none"){ - if(name=="binSpect"){ - gx_sorted = gx_sorted[binSpect.pval<0.01] - }else if(name=="silhouetteRankTest"){ - gx_sorted = gx_sorted[silhouetteRankTest.pval<0.01] +filterSpatialGenes <- function( + gobject, spat_unit = NULL, feat_type = NULL, spatial_genes, max = 2500, + name = c("binSpect", "silhouetteRank", "silhouetteRankTest"), + method = c("none", "elbow")) { + name <- match.arg( + name, + unique(c("binSpect", "silhouetteRank", "silhouetteRankTest", name))) + method <- match.arg(method, unique(c("none", "elbow", method))) + + + # NSE vars + binSpect.pval <- silhouetteRank.score <- silhouetteRankTest.pval <- + feat_ID <- NULL + + # first determine how many spatial genes in this dataset + gx <- fDataDT(gobject, spat_unit = spat_unit, feat_type = feat_type) + + if (name == "binSpect") { + gx <- gx[!is.na(binSpect.pval) & binSpect.pval < 1] + gx_sorted <- gx[order(gx$binSpect.pval, decreasing = FALSE), ] + } else if (name == "silhouetteRank") { + gx <- gx[!is.na(silhouetteRank.score) & silhouetteRank.score > 0] + gx_sorted <- gx[order(gx$silhouetteRank.score, decreasing = TRUE), ] + } else if (name == "silhouetteRankTest") { + gx <- gx[!is.na(silhouetteRankTest.pval) & silhouetteRankTest.pval < 1] + gx_sorted <- gx[order(gx$silhouetteRankTest.pval, decreasing = FALSE), ] } - gx_sorted = head(gx_sorted, n=max) - - }else if(method=="elbow"){ - y0 = c() - if(name=="binSpect"){ - y0 = -log10(gx_sorted$binSpect.pval) - }else if(name=="silhouetteRankTest"){ - y0 = -log10(gx_sorted$silhouetteRankTest.pval) - }else if(name=="silhouetteRank"){ - y0 = gx_sorted$silhouetteRank.score + + # print(gx_sorted) + if (method == "none") { + if (name == "binSpect") { + gx_sorted <- gx_sorted[binSpect.pval < 0.01] + } else if (name == "silhouetteRankTest") { + gx_sorted <- gx_sorted[silhouetteRankTest.pval < 0.01] + } + gx_sorted <- head(gx_sorted, n = max) + } else if (method == "elbow") { + y0 <- c() + if (name == "binSpect") { + y0 <- -log10(gx_sorted$binSpect.pval) + } else if (name == "silhouetteRankTest") { + y0 <- -log10(gx_sorted$silhouetteRankTest.pval) + } else if (name == "silhouetteRank") { + y0 <- gx_sorted$silhouetteRank.score + } + x0 <- seq(1, nrow(gx_sorted)) + + y0s <- sort(y0) + y0s[y0s < 0] <- 0 # strictly positive + # plot(x0, y0) + slope <- (max(y0s) - min(y0s)) / length(y0s) # This is the slope of the + # line we want to slide. This is the diagonal. + xPt <- floor(optimize( + numPts_below_line, lower = 1, upper = length(y0s), + myVector = y0s, slope = slope)$minimum) + xPt <- length(y0s) - xPt + y_cutoff <- y0[xPt] # The y-value at this x point. This is our y_cutoff. + gx_sorted <- head(gx_sorted, n = xPt) + message("Elbow method chosen to determine number of spatial genes.") + cat(paste0("Elbow point determined to be at x=", xPt, " genes", + " y=", y_cutoff)) } - x0 = seq(1, nrow(gx_sorted)) - - y0s<-sort(y0) - y0s[y0s<0]<-0 #strictly positive - #plot(x0, y0) - slope <- (max(y0s)-min(y0s))/length(y0s) #This is the slope of the line we want to slide. This is the diagonal. - #cat(paste0("slope is ", slope, ".\n")) - #tt<-optimize(numPts_below_line,lower=1,upper=length(y0s),myVector=y0s,slope=slope) - #print(tt) - xPt <- floor(optimize(numPts_below_line,lower=1,upper=length(y0s),myVector=y0s,slope=slope)$minimum) - #Find the x-axis point where a line passing through that point has the minimum number of points below it. (ie. tangent) - xPt <- length(y0s) - xPt - y_cutoff <- y0[xPt] #The y-value at this x point. This is our y_cutoff. - gx_sorted = head(gx_sorted, n=xPt) - cat(paste0("\nElbow method chosen to determine number of spatial genes.\n")) - cat(paste0("\nElbow point determined to be at x=", xPt, " genes", " y=", y_cutoff, "\n")) - } - - #filter user's gene list (spatial_genes) - gx_sorted = gx_sorted[feat_ID %in% spatial_genes] - - num_genes_removed = length(spatial_genes) - nrow(gx_sorted) - - return(list(genes=gx_sorted$feat_ID, num_genes_removed=num_genes_removed)) + + # filter user's gene list (spatial_genes) + gx_sorted <- gx_sorted[feat_ID %in% spatial_genes] + + num_genes_removed <- length(spatial_genes) - nrow(gx_sorted) + + return(list( + genes = gx_sorted$feat_ID, num_genes_removed = num_genes_removed)) } #' @title chooseAvailableSpatialGenes #' @name chooseAvailableSpatialGenes -#' @description function to find the test name for existing spatial gene sets in Giotto +#' @description function to find the test name for existing spatial gene sets +#' in Giotto #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type #' @details -#' This function outputs the available test name for existing spatial gene sets in Giotto, +#' This function outputs the available test name for existing spatial gene sets +#' in Giotto, #' which could be used in parameter ‘name’ in `filterSpatialGenes`. -#' Priorities for showing the spatial gene test names are ‘binSpect’ > ‘silhouetteRankTest’ > ‘silhouetteRank’. +#' Priorities for showing the spatial gene test names are ‘binSpect’ > +#' ‘silhouetteRankTest’ > ‘silhouetteRank’. #' @keywords internal -chooseAvailableSpatialGenes <- function(gobject,spat_unit = NULL,feat_type = NULL){ - gx = fDataDT(gobject,spat_unit = NULL,feat_type = NULL) - eval1 = 'binSpect.pval' %in% names(gx) - eval2 = 'silhouetteRankTest.pval' %in% names(gx) - eval3 = 'silhouetteRank.score' %in% names(gx) - if(eval1==TRUE){ - return("binSpect") - }else if(eval2==TRUE){ - return("silhouetteRankTest") - }else if(eval3==TRUE){ - return("silhouetteRank") - }else{ - stop(paste0("No available spatial genes. Please run binSpect or silhouetteRank\n"), call.=FALSE) - } +chooseAvailableSpatialGenes <- function( + gobject, spat_unit = NULL, feat_type = NULL) { + gx <- fDataDT(gobject, spat_unit = NULL, feat_type = NULL) + eval1 <- "binSpect.pval" %in% names(gx) + eval2 <- "silhouetteRankTest.pval" %in% names(gx) + eval3 <- "silhouetteRank.score" %in% names(gx) + if (eval1 == TRUE) { + return("binSpect") + } else if (eval2 == TRUE) { + return("silhouetteRankTest") + } else if (eval3 == TRUE) { + return("silhouetteRank") + } else { + stop(paste0("No available spatial genes. Please run binSpect or + silhouetteRank\n"), call. = FALSE) + } } #' @title checkAndFixSpatialGenes #' @name checkAndFixSpatialGenes -#' @description function to check the selected test name for spatial gene set in Giotto object +#' @description function to check the selected test name for spatial gene set +#' in Giotto object #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type #' @param use_spatial_genes test name of spatial gene set to check #' @param use_score logical variable to select silhouetteRank score #' @details -#' This function checks the user specified test name of spatial gene set in Giotto object. -#' SilhouetteRank works only with score, and SilhouetteRankTest works only with pval. Use parameter use_score to specify. +#' This function checks the user specified test name of spatial gene set in +#' Giotto object. +#' SilhouetteRank works only with score, and SilhouetteRankTest works only +#' with pval. Use parameter use_score to specify. #' @keywords internal checkAndFixSpatialGenes <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - use_spatial_genes, - use_score=FALSE){ - - gx = fDataDT(gobject,spat_unit = NULL,feat_type = NULL) - - if(use_spatial_genes=="silhouetteRank"){ - if(use_score==TRUE){ - use_spatial_genes = "silhouetteRank" - }else{ - eval1 = 'silhouetteRank.score' %in% names(gx) - eval2 = 'silhouetteRankTest.pval' %in% names(gx) - if(eval1==TRUE && eval2==TRUE){ - #if both evaluate to true, then decide by use_score. - #silhouetteRank works only with score, silhouetteRankTest works only with pval - if(use_score==TRUE){ - use_spatial_genes = "silhouetteRank" - }else{ - use_spatial_genes = "silhouetteRankTest" + spat_unit = NULL, + feat_type = NULL, + use_spatial_genes, + use_score = FALSE) { + gx <- fDataDT(gobject, spat_unit = NULL, feat_type = NULL) + + if (use_spatial_genes == "silhouetteRank") { + if (use_score == TRUE) { + use_spatial_genes <- "silhouetteRank" + } else { + eval1 <- "silhouetteRank.score" %in% names(gx) + eval2 <- "silhouetteRankTest.pval" %in% names(gx) + if (eval1 == TRUE && eval2 == TRUE) { + # if both evaluate to true, then decide by use_score. + # silhouetteRank works only with score, silhouetteRankTest + # works only with pval + if (use_score == TRUE) { + use_spatial_genes <- "silhouetteRank" + } else { + use_spatial_genes <- "silhouetteRankTest" + } + } else if (eval1 == TRUE) { + use_spatial_genes <- "silhouetteRank" + } else if (eval2 == TRUE) { + use_spatial_genes <- "silhouetteRankTest" + } else { + stop(paste0("\n use_spatial_genes is set to silhouetteRank, + but it has not been run yet. Run silhouetteRank + first.\n"), call. = FALSE) + } } - }else if(eval1==TRUE){ - use_spatial_genes = "silhouetteRank" - }else if(eval2==TRUE){ - use_spatial_genes = "silhouetteRankTest" - }else{ - stop(paste0("\n use_spatial_genes is set to silhouetteRank, but it has not been run yet. Run silhouetteRank first.\n"), call.=FALSE) - } - } - return(use_spatial_genes) - } - else if(use_spatial_genes=="binSpect"){ - eval1 = 'binSpect.pval' %in% names(gx) - if(eval1==FALSE){ - stop(paste0("\n use_spatial_genes is set to binSpect, but it has not been run yet. Run binSpect first.\n"), call.=FALSE) + return(use_spatial_genes) + } else if (use_spatial_genes == "binSpect") { + eval1 <- "binSpect.pval" %in% names(gx) + if (eval1 == FALSE) { + stop(paste0("use_spatial_genes is set to binSpect, but it has + not been run yet. Run binSpect first."), + call. = FALSE) + } + return(use_spatial_genes) + } else { + stop(paste0("use_spatial_genes is set to one that is not supported."), + call. = FALSE) } - return(use_spatial_genes) - }else{ - stop(paste0("\n use_spatial_genes is set to one that is not supported.\n"), call.=FALSE) - } } @@ -982,443 +1072,569 @@ checkAndFixSpatialGenes <- function(gobject, #' @param expression_values expression values to use #' @param spatial_network_name name of spatial network to use for HMRF #' @param use_spatial_genes which of Giotto's spatial genes to use -#' @param use_score use score as gene selection criterion (applies when use_spatial_genes=silhouetteRank) +#' @param use_score use score as gene selection criterion +#' (applies when use_spatial_genes=silhouetteRank) #' @param gene_list_from_top total spatial genes before sampling -#' @param filter_method filter genes by top or by elbow method, prior to sampling +#' @param filter_method filter genes by top or by elbow method, prior to +#' sampling #' @param user_gene_list user-specified genes (optional) -#' @param use_pca if PCA is used on the spatial gene expression value for clustering +#' @param use_pca if PCA is used on the spatial gene expression value for +#' clustering #' @param use_pca_dim dimensions of the PCs of the selected expression #' @param gene_samples number of spatial gene subset to use for HMRF -#' @param gene_sampling_rate parameter (1-50) controlling proportion of gene samples from different module when sampling, 1 corresponding to equal gene samples between different modules; 50 corresponding to gene samples proportional to module size. +#' @param gene_sampling_rate parameter (1-50) controlling proportion of gene +#' samples from different module when sampling, 1 corresponding to equal gene +#' samples between different modules; 50 corresponding to gene samples +#' proportional to module size. #' @param gene_sampling_seed random number seed to sample spatial genes #' @param use_metagene if metagene expression is used for clustering #' @param cluster_metagene number of metagenes to use -#' @param top_metagene = number of genes in each cluster for the metagene calculation -#' @param existing_spatial_enrichm_to_use name of existing spatial enrichment result to use -#' @param use_neighborhood_composition if neighborhood composition is used for hmrf -#' @param spatial_network_name_for_neighborhood spatial network used to calculate neighborhood composition +#' @param top_metagene = number of genes in each cluster for the metagene +#' calculation +#' @param existing_spatial_enrichm_to_use name of existing spatial enrichment +#' result to use +#' @param use_neighborhood_composition if neighborhood composition is used for +#' hmrf +#' @param spatial_network_name_for_neighborhood spatial network used to +#' calculate neighborhood composition #' @param metadata_to_use metadata used to calculate neighborhood composition -#' @param hmrf_seed random number seed to generate initial mean vector of HMRF model -#' @param cl.method clustering method to calculate the initial mean vector, selecting from 'km', 'leiden', or 'louvain' +#' @param hmrf_seed random number seed to generate initial mean vector of HMRF +#' model +#' @param cl.method clustering method to calculate the initial mean vector, +#' selecting from 'km', 'leiden', or 'louvain' #' @param resolution.cl resolution of Leiden or Louvain clustering #' @param k number of HMRF domains #' @param tolerance error tolerance threshold #' @param zscore type of zscore to use -#' @param nstart number of Kmeans initializations from which to select the best initialization +#' @param nstart number of Kmeans initializations from which to select the +#' best initialization #' @param factor_step dampened factor step #' @param python_path python_path #' @details -#' This function is the initialization step of HMRF domain clustering. First, user specify which of Giotto's spatial genes to run, -#' through use_spatial_genes. Spatial genes have been stored in the gene metadata table. A first pass of genes will filter genes that -#' are not significantly spatial, as determined by filter_method. If filter_method is none, then top 2500 (gene_list_from_top) genes -#' ranked by pvalue are considered spatial. If filter_method is elbow, then the exact cutoff is determined by the elbow in -#' the -log10 P-value vs. gene rank plot. Second, users have a few options to decrease the dimension of the spatial genes for +#' This function is the initialization step of HMRF domain clustering. First, +#' user specify which of Giotto's spatial genes to run, +#' through use_spatial_genes. Spatial genes have been stored in the gene +#' metadata table. A first pass of genes will filter genes that +#' are not significantly spatial, as determined by filter_method. If +#' filter_method is none, then top 2500 (gene_list_from_top) genes +#' ranked by pvalue are considered spatial. If filter_method is elbow, then the +#' exact cutoff is determined by the elbow in +#' the -log10 P-value vs. gene rank plot. Second, users have a few options to +#' decrease the dimension of the spatial genes for #' clustering, listed with selection priority: #' 1. use PCA of the spatial gene expressions (selected by use_pca) #' 2. use metagene expressions (selected by use_metagene) #' 3. sampling to select 500 spatial genes (controlled by gene_samples). -#' Third, once spatial genes are finalized, we are using clustering method to initialize HMRF. -#' Instead of select spatial genes for domain clustering, HMRF method could also applied on unit neighbohood composition of any group -#' membership(such as cell types), specified by parameter: use_neighborhood_composition, spatial_network_name_for_neighborhood and -#' metadata_to_use. Also HMRF provides the oppertunity for user to do clustering by any customized spatial enrichment matrix +#' Third, once spatial genes are finalized, we are using clustering method to +#' initialize HMRF. +#' Instead of select spatial genes for domain clustering, HMRF method could +#' also applied on unit neighbohood composition of any group +#' membership(such as cell types), specified by parameter: +#' use_neighborhood_composition, spatial_network_name_for_neighborhood and +#' metadata_to_use. Also HMRF provides the oppertunity for user to do +#' clustering by any customized spatial enrichment matrix #' (existing_spatial_enrichm_to_use). -#' There are 3 clustering algorithm: K-means, Leiden, and Louvain to determine initial centroids of HMRF. The initialization is -#' then finished. This function returns a list containing y (expression), nei (neighborhood structure), numnei (number of neighbors), -#' blocks (graph colors), damp (dampened factor), mu (mean), sigma (covariance), k, genes, edgelist, init.cl (initial clusters), +#' There are 3 clustering algorithm: K-means, Leiden, and Louvain to determine +#' initial centroids of HMRF. The initialization is +#' then finished. This function returns a list containing y (expression), +#' nei (neighborhood structure), numnei (number of neighbors), +#' blocks (graph colors), damp (dampened factor), mu (mean), +#' sigma (covariance), k, genes, edgelist, init.cl (initial clusters), #' spat_unit, feat_type. This information is needed for the second step, doHMRF. #' @export -initHMRF_V2 = - function (gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c("scaled", "normalized", "custom"), - spatial_network_name = "Delaunay_network", - use_spatial_genes = c("binSpect","silhouetteRank"), - use_score = FALSE, - gene_list_from_top = 2500, - filter_method = c("none", "elbow"), - user_gene_list = NULL, - use_pca = FALSE, - use_pca_dim = 1:20, - gene_samples = 500, - gene_sampling_rate = 2, - gene_sampling_seed = 10, - use_metagene = FALSE, - cluster_metagene = 50, - top_metagene = 20, - existing_spatial_enrichm_to_use = NULL, - use_neighborhood_composition = F, - spatial_network_name_for_neighborhood = NULL, - metadata_to_use = NULL, - hmrf_seed = 100, - # use.leiden = F, - cl.method = c('km','leiden','louvain'), - resolution.cl = 1, - k = 10, - tolerance = 1e-05, - zscore = c("none", "rowcol", "colrow"), - nstart = 1000, - factor_step = 1.05, - python_path = NULL) - { - wrap_msg( - "\nIf used in published research, please cite: - Q Zhu, S Shah, R Dries, L Cai, GC Yuan. - 'Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data' - Nature biotechnology 36 (12), 1183-1190. 2018\n" - ) +initHMRF_V2 <- + function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("scaled", "normalized", "custom"), + spatial_network_name = "Delaunay_network", + use_spatial_genes = c("binSpect", "silhouetteRank"), + use_score = FALSE, + gene_list_from_top = 2500, + filter_method = c("none", "elbow"), + user_gene_list = NULL, + use_pca = FALSE, + use_pca_dim = 1:20, + gene_samples = 500, + gene_sampling_rate = 2, + gene_sampling_seed = 10, + use_metagene = FALSE, + cluster_metagene = 50, + top_metagene = 20, + existing_spatial_enrichm_to_use = NULL, + use_neighborhood_composition = FALSE, + spatial_network_name_for_neighborhood = NULL, + metadata_to_use = NULL, + hmrf_seed = 100, + cl.method = c("km", "leiden", "louvain"), + resolution.cl = 1, + k = 10, + tolerance = 1e-05, + zscore = c("none", "rowcol", "colrow"), + nstart = 1000, + factor_step = 1.05, + python_path = NULL) { + wrap_msg( + "If used in published research, please cite: + Q Zhu, S Shah, R Dries, L Cai, GC Yuan. + 'Identification of spatially associated subpopulations by combining + scRNAseq and sequential fluorescence in situ hybridization data' + Nature biotechnology 36 (12), 1183-1190. 2018\n" + ) + + package_check( + pkg_name = c( + "tidygraph", "ggraph", "dplyr", "smfishHmrf", "graphcoloring" + ), + repository = c( + "CRAN:tidygraph", + "CRAN:ggraph", + "CRAN:dplyr", + "bitbucket:qzhudfci/smfishHmrf-r", + "bitbucket:qzhudfci/graphcoloring" + ) + ) + + + # DT vars + to <- from <- clus <- NULL + + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + gx <- fDataDT(gobject, spat_unit = spat_unit, feat_type = feat_type) + cx <- pDataDT(gobject, spat_unit = spat_unit, feat_type = feat_type) + + spatial_network <- getSpatialNetwork( + gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "networkDT", + copy_obj = FALSE) + spatial_network <- spatial_network[, .(to, from)] + + if (use_neighborhood_composition) { + if (is.null(spatial_network_name_for_neighborhood)) { + stop("spatial network is required to define neighborhood, + set with \'spatial_network_name_for_neighborhood\' \n", + call. = FALSE + ) + } else if (is.null(metadata_to_use)) { + stop("please specify the cluster in meta data, set with + \'metadata_to_use\' \n", + call. = FALSE + ) + } else if (is.null(cx[[metadata_to_use]])) { + stop("please provide a valid index in meta data, set with + \'metadata_to_use\'", + call. = FALSE + ) + } + + cat(paste0("use spatial network composition of \'", + metadata_to_use, "\' for domain clustering")) + + name.cl <- as.character(sort(unique(cx[[metadata_to_use]]))) + + spatial_network_for_neighborhood <- getSpatialNetwork( + gobject, + spat_unit = spat_unit, + name = spatial_network_name_for_neighborhood, + output = "networkDT", + copy_obj = FALSE + ) + + from.all <- c(spatial_network_for_neighborhood$from, + spatial_network_for_neighborhood$to) + to.all <- c(spatial_network_for_neighborhood$to, + spatial_network_for_neighborhood$from) + + ct.tab <- aggregate(cx[[metadata_to_use]][match( + to.all, cx[["cell_ID"]])], + by = list(cell_ID = from.all), function(y) { + table(y)[name.cl] + } + ) + + y0 <- ct.tab[, -1] + y0[is.na(y0)] <- 0 + rownames(y0) <- ct.tab$cell_ID + y0 <- y0 / rowSums(y0) + + } else if (!is.null(existing_spatial_enrichm_to_use)) { + y0 <- getSpatialEnrichment( + gobject, + spat_unit = spat_unit, feat_type = feat_type, + name = existing_spatial_enrichm_to_use, + output = "data.table" + ) + cell_ID_enrich <- y0$cell_ID + y0 <- as.data.frame(y0[, -"cell_ID"]) + rownames(y0) <- cell_ID_enrich + + cat(paste0("Spatial enrichment result: \'", + existing_spatial_enrichm_to_use, "\' is used.")) + + if (sum(!rownames(y0) %in% cx$cell_ID) > 0) { + stop("Rownames of selected spatial enrichment result do not + match to (a subset of) Cell IDs, please fix them.", + call. = FALSE + ) + } + } else { + zscore <- match.arg(zscore, unique(c( + "none", "rowcol", "colrow", + zscore + ))) + use_spatial_genes <- match.arg(use_spatial_genes, unique(c( + "binSpect", + "silhouetteRank", use_spatial_genes + ))) + filter_method <- match.arg(filter_method, unique(c( + "none", + "elbow", filter_method + ))) + values <- match.arg(expression_values, unique(c( + "scaled", + "normalized", "custom", expression_values + ))) + expr_values <- get_expression_values( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, output = "matrix" + ) + if (zscore != "none") { + zscore <- match.arg(zscore, c("none", "colrow", "rowcol")) + expr_values <- get_expression_values( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = "normalized", + output = "matrix" + ) + if (zscore == "colrow") { + expr_values <- t(scale(t(scale(expr_values)))) + } + if (zscore == "rowcol") { + expr_values <- scale(t(scale(t(expr_values)))) + } + } + + spatial_genes <- c() + + if (!"binSpect.pval" %in% names(gx) && + !"silhouetteRank.score" %in% names(gx) && + !"silhouetteRankTest.pval" %in% names(gx)) { + stop(paste0("Giotto spatial gene detection has not been run. + Please run spatial gene detection first: binSpect, + silhouetteRank."), + call. = FALSE + ) + } + + if (!is.null(user_gene_list)) { + message("User supplied gene list detected.") + message("Checking user gene list is spatial...") + + use_spatial_genes <- chooseAvailableSpatialGenes(gobject) + filtered <- filterSpatialGenes( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_genes = user_gene_list, + max = gene_list_from_top, + name = use_spatial_genes, + method = filter_method + ) + if (filtered$num_genes_removed > 0) { + cat(paste0( + "Removed ", filtered$num_genes_removed, + " from user's input gene list due to being absent or + non-spatial genes." + )) + cat(paste0("Kept ", length(filtered$genes), + " spatial genes for next step")) + } + spatial_genes <- filtered$genes + + if (length(spatial_genes) == 0) { + stop("No genes are remaining to do HMRF. Please give a + larger gene list.", + call. = FALSE + ) + } + } else { + cat(paste0( + "Choosing spatial genes from the results of ", + use_spatial_genes + )) + use_spatial_genes <- checkAndFixSpatialGenes( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + use_spatial_genes = use_spatial_genes, + use_score = use_score + ) + all_genes <- gx$feat_ID + filtered <- filterSpatialGenes( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_genes = all_genes, + max = gene_list_from_top, + name = use_spatial_genes, + method = filter_method + ) + cat(paste0("Kept ", length(filtered$genes), + " top spatial genes for next step")) + spatial_genes <- filtered$genes + } + + if (use_pca == TRUE) { + expr_values <- expr_values[spatial_genes, ] + pc.expr <- prcomp(expr_values)[[2]] + use_pca_dim <- use_pca_dim[use_pca_dim %in% 1:ncol(pc.expr)] + y0 <- (pc.expr[, use_pca_dim]) + } else { + message("Computing spatial coexpression modules...") + spat_cor_netw_DT <- detectSpatialCorFeats( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = values, + method = "network", + spatial_network_name = spatial_network_name, + subset_feats = spatial_genes, + network_smoothing = 0 + ) + + if (use_metagene == FALSE) { + n <- min(gene_samples, 500, length(spatial_genes)) + if (n < length(spatial_genes)) { + spat_cor_netw_DT <- clusterSpatialCorFeats( + spat_cor_netw_DT, + name = "spat_netw_clus", + k = 20 + ) + message("Sampling spatial genes from coexpression + modules...") + sample_genes <- sampling_sp_genes( + spat_cor_netw_DT$cor_clusters$spat_netw_clus, + sample_rate = gene_sampling_rate, + target = n, + seed = gene_sampling_seed + ) + spatial_genes_selected <- sample_genes$union_genes + cat(paste0( + "Sampled ", length(spatial_genes_selected), + " genes." + )) + } else { + spatial_genes_selected <- spatial_genes + } + cat(paste0( + "Will use ", length(spatial_genes_selected), + "spatial genes for initialization of HMRF." + )) + expr_values <- expr_values[spatial_genes_selected, ] + } else { + k.sp <- min( + ceiling(length(spatial_genes) / 20), cluster_metagene) + if (k.sp < cluster_metagene) { + cat(paste0( + "construct ", k.sp, + " coexpression modules due to limited gene size...")) + } + spat_cor_netw_DT <- clusterSpatialCorFeats(spat_cor_netw_DT, + name = "spat_netw_clus", k = k.sp + ) + + cluster_genes_DT <- showSpatialCorFeats(spat_cor_netw_DT, + use_clus_name = "spat_netw_clus", + show_top_feats = 1 + ) + + cat(paste0("Collecting top spatial genes and calculating + metagenes from ", k.sp, " coexpression modules...")) + + top_per_module <- cluster_genes_DT[ + , head(.SD, top_metagene), by = clus] + cluster_genes <- top_per_module$clus + names(cluster_genes) <- top_per_module$feat_ID + + meta.genes <- createMetafeats( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = values, + feat_clusters = cluster_genes, + return_gobject = FALSE + ) + + expr_values <- t(meta.genes@enrichDT[, 1:k.sp]) + colnames(expr_values) <- unlist( + meta.genes@enrichDT[, "cell_ID"]) + rownames(expr_values) <- paste0( + "metagene_", rownames(expr_values)) + } + + y0 <- t(as.matrix(expr_values)) + } + } - package_check( - pkg_name = c( - "tidygraph", "ggraph", "dplyr", "smfishHmrf", "graphcoloring" - ), - repository = c( - "CRAN:tidygraph", - "CRAN:ggraph", - "CRAN:dplyr", - "bitbucket:qzhudfci/smfishHmrf-r", - "bitbucket:qzhudfci/graphcoloring" - ) - ) + cell.rm <- setdiff(rownames(y0), unique(c( + spatial_network$to, + spatial_network$from + ))) + + if (length(cell.rm) > 0) { + y0 <- y0[-match(cell.rm, rownames(y0)), ] + } + ############################## + ## scale y matrix on each sample + y <- t(scale(t(y0))) - # DT vars - to = from = clus = NULL - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - gx = fDataDT(gobject,spat_unit = spat_unit,feat_type = feat_type) - cx = pDataDT(gobject,spat_unit = spat_unit,feat_type = feat_type) - - spatial_network = get_spatialNetwork(gobject, spat_unit = spat_unit, name = spatial_network_name, output ='networkDT', copy_obj = FALSE) - spatial_network = spatial_network[, .(to, from)] - - if(use_neighborhood_composition) - { - if(is.null(spatial_network_name_for_neighborhood)) - { - stop("spatial network is required to define neighborhood, set with \'spatial_network_name_for_neighborhood\' \n", - call. = FALSE) - }else if(is.null(metadata_to_use)) - { - stop("please specify the cluster in meta data, set with \'metadata_to_use\' \n", - call. = FALSE) - }else if(is.null(cx[[metadata_to_use]])) - { - stop("please provide a valid index in meta data, set with \'metadata_to_use\' \n", - call. = FALSE) - } - - cat(paste0("\n use spatial network composition of \'",metadata_to_use, "\' for domain clustering \n")) - - name.cl = as.character(sort(unique(cx[[metadata_to_use]]))) - - spatial_network_for_neighborhood = get_spatialNetwork(gobject,spat_unit = spat_unit, - name = spatial_network_name_for_neighborhood, output ='networkDT', copy_obj = FALSE) - - from.all = c(spatial_network_for_neighborhood$from,spatial_network_for_neighborhood$to) - to.all = c(spatial_network_for_neighborhood$to,spatial_network_for_neighborhood$from) - - ct.tab = aggregate(cx[[metadata_to_use]][match(to.all,cx[['cell_ID']])], - by = list(cell_ID = from.all),function(y){table(y)[name.cl]}) - - y0 = ct.tab[,-1] - y0[is.na(y0)] = 0 - rownames(y0) = ct.tab$cell_ID - y0 = y0/rowSums(y0) - - # cell.rm = setdiff(rownames(y0), unique(c(spatial_network$to, - # spatial_network$from))) - # - # if (length(cell.rm) > 0) - # y0 = y0[-match(cell.rm, rownames(y0)), ] - # ################################################ - # ## is scale y0 matrix on each sample needed? - # y = y0 - - }else if(!is.null(existing_spatial_enrichm_to_use)) - { - y0 = getSpatialEnrichment(gobject,spat_unit = spat_unit,feat_type = feat_type, - name = existing_spatial_enrichm_to_use,output = 'data.table') - cell_ID_enrich = y0$cell_ID - y0 = as.data.frame(y0[,-'cell_ID']) - rownames(y0) = cell_ID_enrich - - cat(paste0("\n Spatial enrichment result: \'",existing_spatial_enrichm_to_use,"\' is used.\n")) - - if(sum(!rownames(y0)%in%cx$cell_ID)>0) - {stop("\n Rownames of selected spatial enrichment result do not match to (a subset of) Cell IDs, please fix them. \n", - call. = FALSE)} - }else{ - zscore = match.arg(zscore, unique(c("none", "rowcol", "colrow", - zscore))) - use_spatial_genes = match.arg(use_spatial_genes, unique(c("binSpect", - "silhouetteRank", use_spatial_genes))) - filter_method = match.arg(filter_method, unique(c("none", - "elbow", filter_method))) - values = match.arg(expression_values, unique(c("scaled", - "normalized", "custom", expression_values))) - expr_values = get_expression_values(gobject = gobject, spat_unit = spat_unit, feat_type = feat_type, - values = values, output = 'matrix') - if (zscore != "none") { - zscore = match.arg(zscore, c("none", "colrow", "rowcol")) - expr_values = get_expression_values(gobject = gobject, spat_unit = spat_unit, feat_type = feat_type, - values = "normalized", output = 'matrix') - if (zscore == "colrow") { - expr_values = t(scale(t(scale(expr_values)))) + ## do not scale y if using neighborhood composition + if (use_neighborhood_composition) { + y <- y0 } - if (zscore == "rowcol") { - expr_values = scale(t(scale(t(expr_values)))) + + + numcell <- dim(y)[1] + m <- dim(y)[2] + ncol.nei <- max(table(c(spatial_network$to, spatial_network$from))) + nei <- matrix(-1, ncol = ncol.nei, nrow = numcell) + rownames(nei) <- rownames(y) + for (i in 1:numcell) { + nei.i <- c(spatial_network$from[spatial_network$to == + rownames(nei)[i]], spatial_network$to[spatial_network$from == + rownames(nei)[i]]) + if (length(nei.i) > 0) { + nei[i, seq_along(nei.i)] <- sort(match(nei.i, rownames(y))) + } } - } - - spatial_genes = c() - - if (!"binSpect.pval" %in% names(gx) && - !"silhouetteRank.score" %in% names(gx) && - !"silhouetteRankTest.pval" %in% names(gx)) { - stop(paste0("Giotto spatial gene detection has not been run. Please run spatial gene detection first: binSpect, silhouetteRank.\n"), - call. = FALSE) - } - - if (!is.null(user_gene_list)) { - cat(paste0("\n User supplied gene list detected.\n")) - cat(paste0("\n Checking user gene list is spatial...\n")) - # if (!"binSpect.pval" %in% names(gobject@gene_metadata) && - # !"silhouetteRank.score" %in% names(gobject@gene_metadata) && - # !"silhouetteRankTest.pval" %in% names(gobject@gene_metadata)) { - # stop(paste0("\n Giotto's spatial gene detection has not been run. Cannot check user's gene list. Please run spatial gene detection first: binSpect, silhouetteRank, silhouetteRankTest.\n"), - # call. = FALSE) - # } - use_spatial_genes = chooseAvailableSpatialGenes(gobject) - filtered = filterSpatialGenes(gobject, spat_unit = spat_unit,feat_type = feat_type, spatial_genes = user_gene_list, - max = gene_list_from_top, name = use_spatial_genes, - method = filter_method) - if (filtered$num_genes_removed > 0) { - cat(paste0("\n Removed ", filtered$num_genes_removed, - " from user's input gene list due to being absent or non-spatial genes.\n")) - cat(paste0("\n Kept ", length(filtered$genes), " spatial genes for next step\n")) + numnei <- as.integer(rowSums(nei != (-1))) + nn <- nei + numedge <- 0 + for (i in 1:numcell) { + numedge <- numedge + length(nn[i, nn[i, ] != -1]) } - spatial_genes = filtered$genes - - if (length(spatial_genes) == 0) { - stop(paste0("\n No genes are remaining to do HMRF. Please give a larger gene list.\n"), - call. = FALSE) + edgelist <- matrix(0, nrow = numedge, ncol = 2) + edge_ind <- 1 + for (i in 1:numcell) { + neighbors <- nn[i, nn[i, ] != -1] + for (j in seq_along(neighbors)) { + edgelist[edge_ind, ] <- c(i, neighbors[j]) + edge_ind <- edge_ind + 1 + } } - }else{ - # if (is.null(user_gene_list)) { - cat(paste0("\n Choosing spatial genes from the results of ", - use_spatial_genes, "\n")) - use_spatial_genes = checkAndFixSpatialGenes(gobject, spat_unit = spat_unit,feat_type = feat_type, - use_spatial_genes = use_spatial_genes, use_score = use_score) - all_genes = gx$feat_ID - filtered = filterSpatialGenes(gobject, spat_unit = spat_unit,feat_type = feat_type, - spatial_genes = all_genes, max = gene_list_from_top, - name = use_spatial_genes, method = filter_method) - cat(paste0("\n Kept ", length(filtered$genes), " top spatial genes for next step\n")) - spatial_genes = filtered$genes - } - - if(use_pca == T) - { - expr_values = expr_values[spatial_genes, ] - pc.expr = prcomp(expr_values)[[2]] - use_pca_dim = use_pca_dim[use_pca_dim %in% 1:ncol(pc.expr)] - y0 = (pc.expr[,use_pca_dim]) - }else{ - cat(paste0("\n Computing spatial coexpression modules...\n")) - spat_cor_netw_DT = detectSpatialCorFeats(gobject = gobject, - feat_type = feat_type, spat_unit = spat_unit, expression_values = values, - method = "network", spatial_network_name = spatial_network_name, - subset_feats = spatial_genes, network_smoothing = 0) - - if(use_metagene==F) - { - n = min(gene_samples, 500, length(spatial_genes)) - if (n < length(spatial_genes)) { - spat_cor_netw_DT = clusterSpatialCorFeats(spat_cor_netw_DT, - name = "spat_netw_clus", k = 20) - cat(paste0("\n Sampling spatial genes from coexpression modules...\n")) - sample_genes = sampling_sp_genes(spat_cor_netw_DT$cor_clusters$spat_netw_clus, - sample_rate = gene_sampling_rate, target = n, seed = gene_sampling_seed) - spatial_genes_selected = sample_genes$union_genes - cat(paste0("\n Sampled ", length(spatial_genes_selected), - " genes.\n")) - } - else { - spatial_genes_selected = spatial_genes - } - cat(paste0("\n Will use ", length(spatial_genes_selected), - "spatial genes for initialization of HMRF.\n")) - expr_values = expr_values[spatial_genes_selected, ] - - }else{ - # cat(paste0("\n Computing spatial coexpression modules...\n")) - # spat_cor_netw_DT = detectSpatialCorGenes(gobject = gobject, - # feat_type = feat_type, spat_unit = spat_unit, expression_values = values, - # method = "network", spatial_network_name = spatial_network_name, - # subset_feats = spatial_genes, network_smoothing = 0) - k.sp = min(ceiling(length(spatial_genes)/20),cluster_metagene) - if(k.sp% dplyr::mutate( + color = as.factor(graphcoloring::color_dsatur())) + colors <- as.list(yy)$nodes$color + cl_color <- sort(unique(colors)) + blocks <- lapply(cl_color, function(cl) { + which(colors == cl) + }) + + cl.method <- tolower(cl.method) + if (!cl.method %in% c("km", "leiden", "louvain")) { + cl.method <- "km" + message("clustering method not specified, use kmeans as default...") } - y0 = t(as.matrix(expr_values)) - - } - - } - - cell.rm = setdiff(rownames(y0), unique(c(spatial_network$to, - spatial_network$from))) - - if (length(cell.rm) > 0) - y0 = y0[-match(cell.rm, rownames(y0)), ] - - ############################## - ## scale y matrix on each sample - y = t(scale(t(y0))) - - ## do not scale y if using neighborhood composition - if(use_neighborhood_composition){ - y = y0 - } - - # if(use_pca ==T) - # { - # pc.y = prcomp(t(y))[[2]] - # use_pca_dim = use_pca_dim[use_pca_dim %in% 1:ncol(pc.y)] - # y = (pc.y[,use_pca_dim]) - # - # colnames(y) = paste0(ifelse(use_metagene,yes = 'meta_',no = 'spatialgene_'),'PC',use_pca_dim) - # } - - numcell <- dim(y)[1] - m <- dim(y)[2] - ncol.nei = max(table(c(spatial_network$to, spatial_network$from))) - nei = matrix(-1, ncol = ncol.nei, nrow = numcell) - rownames(nei) = rownames(y) - for (i in 1:numcell) { - nei.i = c(spatial_network$from[spatial_network$to == - rownames(nei)[i]], spatial_network$to[spatial_network$from == - rownames(nei)[i]]) - if (length(nei.i) > 0) - nei[i, seq_along(nei.i)] = sort(match(nei.i, rownames(y))) - } - numnei <- as.integer(rowSums(nei != (-1))) - nn <- nei - numedge <- 0 - for (i in 1:numcell) { - numedge <- numedge + length(nn[i, nn[i, ] != -1]) - } - edgelist <- matrix(0, nrow = numedge, ncol = 2) - edge_ind <- 1 - for (i in 1:numcell) { - neighbors <- nn[i, nn[i, ] != -1] - for (j in seq_along(neighbors)) { - edgelist[edge_ind, ] <- c(i, neighbors[j]) - edge_ind <- edge_ind + 1 - } - } - cat(paste0("\n Parsing neighborhood graph...\n")) - pp <- tidygraph::tbl_graph(edges = as.data.frame(edgelist), directed = FALSE) - yy <- pp %>% dplyr::mutate(color = as.factor(graphcoloring::color_dsatur())) - colors <- as.list(yy)$nodes$color - cl_color <- sort(unique(colors)) - blocks <- lapply(cl_color, function(cl) { - which(colors == cl) - }) - - cl.method = tolower(cl.method) - if(!cl.method%in%c('km','leiden','louvain')) - { - cl.method ='km' - cat(paste0("\n clustering method not specified, use kmeans as default...\n")) - } - - # if(use.leiden == F) - if(cl.method == 'km') - { - cat(paste0("\n Kmeans initialization...\n")) - kk = smfishHmrf::smfishHmrf.generate.centroid( - y = y, par_k = k, par_seed = hmrf_seed, - nstart = nstart - ) - mu <- t(kk$centers) - lclust <- lapply(1:k, function(x) which(kk$cluster == x)) - - }else{ - ##### need to double check leiden and louvain cluster functions - gobject@dimension_reduction$cells$spatial <- NULL - gobject@dimension_reduction$cells$spatial$spatial_feat <- NULL - gobject@dimension_reduction$cells$spatial$spatial_feat$name <- 'spatial_feat' - gobject@dimension_reduction$cells$spatial$spatial_feat$reduction_method <- 'spatial' - gobject@dimension_reduction$cells$spatial$spatial_feat$coordinates <- y - - gobject <- createNearestNetwork(gobject = gobject, - dim_reduction_to_use = 'spatial',dim_reduction_name = 'spatial_feat',dimensions_to_use = 1:ncol(y), - name = 'sNN.initHMRF') - - if(cl.method == 'leiden'){ - cat(paste0("\n Leiden clustering initialization...\n")) - leiden.cl <- doLeidenCluster(gobject = gobject,nn_network_to_use = 'sNN',network_name = 'sNN.initHMRF',set_seed = hmrf_seed,return_gobject = F, - python_path = python_path,resolution = resolution.cl) - cl.match = leiden.cl$leiden_clus[match(rownames(y),leiden.cl$cell_ID)] - mu <- aggregate(y,by = list(cl.match),FUN = mean) - }else if(cl.method == 'louvain'){ - cat(paste0("\n Louvain clustering initialization...\n")) - louvain.cl <- doLouvainCluster(gobject = gobject,nn_network_to_use = 'sNN',network_name = 'sNN.initHMRF',set_seed = hmrf_seed,return_gobject = F, - python_path = python_path,resolution = resolution.cl) - cl.match = louvain.cl$louvain_clus[match(rownames(y),louvain.cl$cell_ID)] - mu <- aggregate(y,by = list(cl.match),FUN = mean) - } - - rownames(mu) <- mu[,1] - mu <- t(mu[,-1]) - k <- dim(mu)[2] - lclust <- lapply(colnames(mu), function(x) which(cl.match == x)) - cat(paste0("\n k is automatically identified as ",k, ".\n")) - } + if (cl.method == "km") { + message("Kmeans initialization...") + kk <- smfishHmrf::smfishHmrf.generate.centroid( + y = y, par_k = k, par_seed = hmrf_seed, + nstart = nstart + ) + mu <- t(kk$centers) + lclust <- lapply(1:k, function(x) which(kk$cluster == x)) + } else { + ##### need to double check leiden and louvain cluster functions + gobject@dimension_reduction$cells$spatial <- NULL + gobject@dimension_reduction$cells$spatial$spatial_feat <- NULL + gobject@dimension_reduction$cells$spatial$spatial_feat$name <- "spatial_feat" + gobject@dimension_reduction$cells$spatial$spatial_feat$reduction_method <- "spatial" + gobject@dimension_reduction$cells$spatial$spatial_feat$coordinates <- y + + gobject <- createNearestNetwork( + gobject = gobject, + dim_reduction_to_use = "spatial", + dim_reduction_name = "spatial_feat", + dimensions_to_use = seq_len(ncol(y)), + name = "sNN.initHMRF" + ) + + if (cl.method == "leiden") { + message("Leiden clustering initialization...") + leiden.cl <- doLeidenCluster( + gobject = gobject, + nn_network_to_use = "sNN", + network_name = "sNN.initHMRF", + set_seed = hmrf_seed, + return_gobject = FALSE, + python_path = python_path, + resolution = resolution.cl + ) + cl.match <- leiden.cl$leiden_clus[ + match(rownames(y), leiden.cl$cell_ID)] + mu <- aggregate(y, by = list(cl.match), FUN = mean) + } else if (cl.method == "louvain") { + message("Louvain clustering initialization...") + louvain.cl <- doLouvainCluster( + gobject = gobject, + nn_network_to_use = "sNN", + network_name = "sNN.initHMRF", + set_seed = hmrf_seed, + return_gobject = FALSE, + python_path = python_path, + resolution = resolution.cl + ) + cl.match <- louvain.cl$louvain_clus[ + match(rownames(y), louvain.cl$cell_ID)] + mu <- aggregate(y, by = list(cl.match), FUN = mean) + } + + rownames(mu) <- mu[, 1] + mu <- t(mu[, -1]) + k <- dim(mu)[2] + lclust <- lapply(colnames(mu), function(x) which(cl.match == x)) + cat(paste0("k is automatically identified as ", k)) + } - damp <- array(0, c(k)) - sigma <- array(0, c(m, m, k)) - for (i in 1:k) { - sigma[, , i] <- cov(y[lclust[[i]], ]) - di <- smfishHmrf::findDampFactor( - sigma[, , i], factor = factor_step, - d_cutoff = tolerance, startValue = 1e-04 - ) - damp[i] <- ifelse(is.null(di), 0, di) + damp <- array(0, c(k)) + sigma <- array(0, c(m, m, k)) + for (i in 1:k) { + sigma[, , i] <- cov(y[lclust[[i]], ]) + di <- smfishHmrf::findDampFactor( + sigma[, , i], + factor = factor_step, + d_cutoff = tolerance, startValue = 1e-04 + ) + damp[i] <- ifelse(is.null(di), 0, di) + } + message("Done") + list( + y = y, nei = nei, numnei = numnei, blocks = blocks, + damp = damp, mu = mu, sigma = sigma, k = k, genes = colnames(y), + edgelist = edgelist, init.cl = lclust, spat_unit = spat_unit, + feat_type = feat_type + ) } - cat(paste0("\n Done\n")) - list(y = y, nei = nei, numnei = numnei, blocks = blocks, - damp = damp, mu = mu, sigma = sigma, k = k, genes = colnames(y), - edgelist = edgelist, init.cl = lclust,spat_unit = spat_unit, feat_type = feat_type) - } @@ -1427,112 +1643,127 @@ initHMRF_V2 = #' @title doHMRF_V2 #' @name doHMRF_V2 #' @description function to run HMRF model -#' @param HMRF_init_obj initialization object list returned from initHMRF() function -#' @param betas beta value of the HMRF model, controlling the smoothness of clustering. NULL value of beta will provide default values based on feature numbers, otherwise, a vector of three values: initial beta, beta increment, and number of betas +#' @param HMRF_init_obj initialization object list returned from initHMRF() +#' function +#' @param betas beta value of the HMRF model, controlling the smoothness of +#' clustering. NULL value of beta will provide default values based on feature +#' numbers, otherwise, a vector of three values: initial beta, beta increment, +#' and number of betas #' @details -#' This function will run a HMRF model after initialization of HMRF. Of note is the beta parameter, the smoothing parameter. -#' If the users are interested in selecting results from different smoothness, we recommend running a range of betas, -#' hence betas specify what this range is. For example, betas=c(0,10,5) will run for the following betas: 0, 10, 20, 30, 40. -#' betas=c(0,5,2) will run for betas: 0, 5, 10. Setting the beta can use the following guideline: +#' This function will run a HMRF model after initialization of HMRF. Of note +#' is the beta parameter, the smoothing parameter. +#' If the users are interested in selecting results from different smoothness, +#' we recommend running a range of betas, +#' hence betas specify what this range is. For example, betas=c(0,10,5) will +#' run for the following betas: 0, 10, 20, 30, 40. +#' betas=c(0,5,2) will run for betas: 0, 5, 10. Setting the beta can use the +#' following guideline: #' If number of features N is 10 0)) { - stop("\n please provide betas as a vector of 3 non-negative numbers (initial value, nicrement, total iteration number) \n") - }else{ - beta_init = betas[1] - beta_increment = betas[2] - beta_num_iter = betas[3] - beta_seq = (1:beta_num_iter - 1) * beta_increment + beta_init - beta_seq = sort(unique(c(0, beta_seq))) - } - - res <- c() - for (beta_current in beta_seq) { - print(sprintf("Doing beta=%.3f", beta_current)) - tc.hmrfem <- smfishHmrf::smfishHmrf.hmrfem.multi( - y = y, neighbors = nei, - beta = beta_current, numnei = numnei, blocks = blocks, - mu = mu, sigma = sigma, verbose = T, err = 1e-07, - maxit = 50, dampFactor = damp + 'Identification of spatially associated subpopulations by combining + scRNAseq and sequential fluorescence in situ hybridization data' + Nature biotechnology 36 (12), 1183-1190. 2018" ) - # if (mean(apply(tc.hmrfem$prob, 1, max) < (1/k + 0.01)) > 0.5) { - # cat(paste0("\n HMRF is stopping at large beta >= ", - # beta_current, ", numerical error occurs, results of smaller betas were stored\n")) - # (break)() - # } - t_key <- sprintf("k=%d b=%.2f", k, beta_current) - tc.hmrfem$sigma = NULL - tc.hmrfem$mu = NULL - rownames(tc.hmrfem$prob) = rownames(y) - rownames(tc.hmrfem$unnormprob) = rownames(y) - names(tc.hmrfem$class) = rownames(y) - res[[t_key]] <- tc.hmrfem - } - result.hmrf = res - result.hmrf[['spat_unit']] = spat_unit - result.hmrf[['feat_type']] = feat_type - class(result.hmrf) <- append(class(result.hmrf), "HMRFoutput") - return(result.hmrf) + + message("Please find more explanation and instruction of the HMRF function + on \n https://bitbucket.org/qzhudfci/smfishhmrf-r/src/master/TRANSITION.md") + if (!"y" %in% names(HMRF_init_obj)) { + stop("expression matrix 'y' not in the intialization object") + } + if (!"nei" %in% names(HMRF_init_obj)) { + stop("neighbor matrix 'nei' not in the intialization object") + } + if (!"numnei" %in% names(HMRF_init_obj)) { + stop("number of neighbors 'numnei' not in the intialization object") + } + if (!"blocks" %in% names(HMRF_init_obj)) { + stop("iteration groups 'blocks' not in the intialization object") + } + if (!"damp" %in% names(HMRF_init_obj)) { + stop("dampen factors 'damp' not in the intialization object") + } + if (!"mu" %in% names(HMRF_init_obj)) { + stop("initial mean vector 'mu' not in the intialization object") + } + if (!"sigma" %in% names(HMRF_init_obj)) { + stop("initial covariance matrix 'sigma' not in the intialization + object") + } + if (!"k" %in% names(HMRF_init_obj)) { + stop("cluster number 'k' not in the intialization object") + } + if (!"spat_unit" %in% names(HMRF_init_obj)) { + HMRF_init_obj[["spat_unit"]] <- NULL + } + if (!"feat_type" %in% names(HMRF_init_obj)) { + HMRF_init_obj[["feat_type"]] <- NULL + } + + y <- HMRF_init_obj$y + nei <- HMRF_init_obj$nei + numnei <- HMRF_init_obj$numnei + blocks <- HMRF_init_obj$blocks + damp <- HMRF_init_obj$damp + mu <- HMRF_init_obj$mu + sigma <- HMRF_init_obj$sigma + k <- HMRF_init_obj$k + spat_unit <- HMRF_init_obj$spat_unit + feat_type <- HMRF_init_obj$feat_type + + if (is.null(betas)) { + beta_seq <- max(ceiling(ncol(y) / 10), 2) + cat(paste0("Default value beta = ", beta_seq, " is used...")) + } else if (length(betas) != 3 || (sum(betas[1:3] < 0) > 0)) { + stop("please provide betas as a vector of 3 non-negative numbers + (initial value, nicrement, total iteration number)") + } else { + beta_init <- betas[1] + beta_increment <- betas[2] + beta_num_iter <- betas[3] + beta_seq <- (1:beta_num_iter - 1) * beta_increment + beta_init + beta_seq <- sort(unique(c(0, beta_seq))) + } + + res <- c() + for (beta_current in beta_seq) { + print(sprintf("Doing beta=%.3f", beta_current)) + tc.hmrfem <- smfishHmrf::smfishHmrf.hmrfem.multi( + y = y, neighbors = nei, + beta = beta_current, + numnei = numnei, + blocks = blocks, + mu = mu, + sigma = sigma, + verbose = TRUE, + err = 1e-07, + maxit = 50, + dampFactor = damp + ) + + t_key <- sprintf("k=%d b=%.2f", k, beta_current) + tc.hmrfem$sigma <- NULL + tc.hmrfem$mu <- NULL + rownames(tc.hmrfem$prob) <- rownames(y) + rownames(tc.hmrfem$unnormprob) <- rownames(y) + names(tc.hmrfem$class) <- rownames(y) + res[[t_key]] <- tc.hmrfem + } + result.hmrf <- res + result.hmrf[["spat_unit"]] <- spat_unit + result.hmrf[["feat_type"]] <- feat_type + class(result.hmrf) <- append(class(result.hmrf), "HMRFoutput") + return(result.hmrf) } @@ -1543,43 +1774,46 @@ doHMRF_V2 = function (HMRF_init_obj, betas = NULL) #' @param HMRFoutput result object from HMRF model #' @param name name of HMRF models #' @details -#' This function appends HMRF domain clusters to corresponding cell meta data for all the beta values, with the given HMRF model names. For example, if name = ‘hmrf1’ and name of result in HMRFoutput is ‘k=8 b=0.00’, the appended cell meta data column will be named with ‘hmrf1 k=8 b=0.00’ +#' This function appends HMRF domain clusters to corresponding cell meta data +#' for all the beta values, with the given HMRF model names. For example, if +#' name = ‘hmrf1’ and name of result in HMRFoutput is ‘k=8 b=0.00’, the +#' appended cell meta data column will be named with ‘hmrf1 k=8 b=0.00’ #' @export -addHMRF_V2 = function (gobject, HMRFoutput, name = 'hmrf') -{ - if (!"HMRFoutput" %in% class(HMRFoutput)) { - stop("\n HMRFoutput needs to be output from doHMRF_V2() \n") - } - if (!"spat_unit" %in% names(HMRFoutput)) { - HMRFoutput[['spat_unit']] = NULL - } - if (!"feat_type" %in% names(HMRFoutput)) { - HMRFoutput[['feat_type']] = NULL - } - spat_unit = HMRFoutput$spat_unit - feat_type = HMRFoutput$feat_type - - cx = getCellMetadata( - gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE - ) - ordered_cell_IDs = cx$cell_ID - - for (i in seq_along(grep('k', names(HMRFoutput)))) { - gobject = addCellMetadata( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - column_cell_ID = "cell_ID", - new_metadata = HMRFoutput[[i]]$class[match(ordered_cell_IDs,names(HMRFoutput[[i]]$class))], - vector_name = paste(name,names(HMRFoutput)[i]), - by_column = TRUE - ) - } - return(gobject) +addHMRF_V2 <- function(gobject, HMRFoutput, name = "hmrf") { + if (!"HMRFoutput" %in% class(HMRFoutput)) { + stop("HMRFoutput needs to be output from doHMRF_V2()") + } + if (!"spat_unit" %in% names(HMRFoutput)) { + HMRFoutput[["spat_unit"]] <- NULL + } + if (!"feat_type" %in% names(HMRFoutput)) { + HMRFoutput[["feat_type"]] <- NULL + } + spat_unit <- HMRFoutput$spat_unit + feat_type <- HMRFoutput$feat_type + + cx <- getCellMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + ordered_cell_IDs <- cx$cell_ID + + for (i in seq_along(grep("k", names(HMRFoutput)))) { + gobject <- addCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + column_cell_ID = "cell_ID", + new_metadata = HMRFoutput[[i]]$class[match( + ordered_cell_IDs, names(HMRFoutput[[i]]$class))], + vector_name = paste(name, names(HMRFoutput)[i]), + by_column = TRUE + ) + } + return(gobject) } @@ -1605,84 +1839,109 @@ addHMRF_V2 = function (gobject, HMRFoutput, name = 'hmrf') #' @param save_param other saving parameters #' @param \dots additional params to pass to plotting #' @details -#' This function plots spatial map of HMRF domain clusters for multiple beta with the name (hmrf_name), -#' matching the first part of the cell meta column names with HMRF clusters (for example name of ‘hmrf1 k=8 b=0.00’ is ‘hmrf1’) +#' This function plots spatial map of HMRF domain clusters for multiple beta +#' with the name (hmrf_name), +#' matching the first part of the cell meta column names with HMRF clusters +#' (for example name of ‘hmrf1 k=8 b=0.00’ is ‘hmrf1’) #' @export -viewHMRFresults_V2 = - function (gobject, k, betas, - hmrf_name, - spat_unit = NULL, - feat_type = NULL, - third_dim = FALSE, - cow_n_col = 2, - cow_rel_h = 1, - cow_rel_w = 1, - cow_align = "h", - show_plot = TRUE, - save_plot = TRUE, - return_plot = TRUE, - default_save_name = 'HMRF_result', - save_param = list(), - ...) - { - # beta_seq = round(betas,digits = 2) - # t_key = paste0(hmrf_name,'_k', k, '_b.',beta_seq) - t_key = paste(hmrf_name,sprintf("k=%d b=%.2f", k, betas)) +viewHMRFresults_V2 <- + function(gobject, k, betas, + hmrf_name, + spat_unit = NULL, + feat_type = NULL, + third_dim = FALSE, + cow_n_col = 2, + cow_rel_h = 1, + cow_rel_w = 1, + cow_align = "h", + show_plot = TRUE, + save_plot = TRUE, + return_plot = TRUE, + default_save_name = "HMRF_result", + save_param = list(), + ...) { + # beta_seq = round(betas,digits = 2) + # t_key = paste0(hmrf_name,'_k', k, '_b.',beta_seq) + t_key <- paste(hmrf_name, sprintf("k=%d b=%.2f", k, betas)) + + meta_names <- colnames(combineMetadata( + gobject = gobject, spat_unit = spat_unit, feat_type = feat_type)) + + if (length(setdiff(t_key, meta_names)) > 0) { + beta_null <- paste(betas[which(!t_key %in% meta_names)], + collapse = ",") + stop(paste0('\n HMRF result "', hmrf_name, '" of k = ', k, + ", beta = ", beta_null, + " was not found in the Giotto object.")) + } - meta_names = colnames(combineMetadata(gobject = gobject,spat_unit = spat_unit,feat_type = feat_type)) + savelist <- list() + for (kk in seq_along(t_key)) + { + if (third_dim == TRUE) { + pl <- spatPlot3D( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + cell_color = t_key[kk], + show_plot = FALSE, + save_plot = FALSE, + title = t_key[kk], + default_save_name = "HMRF_result", + return_plot = TRUE, + ... + ) + } else { + pl <- spatPlot2D( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + cell_color = t_key[kk], + show_plot = FALSE, + save_plot = FALSE, + title = t_key[kk], + cow_n_col = 1, + cow_rel_h = 1, + cow_rel_w = 1, + cow_align = "h", + default_save_name = "HMRF_result", + return_plot = TRUE, + ... + ) + } + savelist[[kk]] <- pl + } - if(length(setdiff(t_key,meta_names))>0) - { - beta_null = paste(betas[which(!t_key%in%meta_names)],collapse = ',') - stop(paste0('\n HMRF result "',hmrf_name,'" of k = ',k,', beta = ',beta_null,' was not found in the Giotto object. \n')) - } + if (cow_n_col > length(savelist)) { + cow_n_col <- length(savelist) + } - savelist = list() - for(kk in seq_along(t_key)) - { - if (third_dim == TRUE) { - pl = spatPlot3D(gobject = gobject, spat_unit = spat_unit,feat_type = feat_type, - cell_color = t_key[kk], show_plot = F, save_plot = F,title = t_key[kk], - default_save_name = 'HMRF_result', return_plot = T,...) - }else{ - pl = spatPlot2D(gobject = gobject, spat_unit = spat_unit,feat_type = feat_type, - cell_color = t_key[kk], show_plot = F, save_plot = F,title = t_key[kk], - cow_n_col = 1, cow_rel_h = 1, cow_rel_w = 1, cow_align = "h", - default_save_name = 'HMRF_result', return_plot = T,...) - } - savelist[[kk]] = pl - } + # combine plots with cowplot + combo_plot <- cowplot::plot_grid( + plotlist = savelist, + ncol = cow_n_col, + rel_heights = cow_rel_h, + rel_widths = cow_rel_w, + align = cow_align + ) + + # assign default ncol and nrow if not in save_param + if (!"ncol" %in% names(save_param)) { + save_param[["ncol"]] <- cow_n_col + } + if (!"nrow" %in% names(save_param)) { + save_param[["nrow"]] <- ceiling(length(savelist) / cow_n_col) + } - if (cow_n_col > length(savelist)) { - cow_n_col = length(savelist) + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = combo_plot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } - # {stop("\n please provide a cow_n_col smaller than the number of plots")} - - # combine plots with cowplot - combo_plot <- cowplot::plot_grid(plotlist = savelist, - ncol = cow_n_col, - rel_heights = cow_rel_h, - rel_widths = cow_rel_w, - align = cow_align) - - # assign default ncol and nrow if not in save_param - if(!'ncol'%in%names(save_param)) - {save_param[['ncol']] = cow_n_col} - if(!'nrow'%in%names(save_param)) - {save_param[['nrow']] = ceiling(length(savelist)/cow_n_col)} - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = combo_plot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) - } - - - diff --git a/R/python_scrublet.R b/R/python_scrublet.R index f00796fb6..6606cd4e8 100644 --- a/R/python_scrublet.R +++ b/R/python_scrublet.R @@ -1,4 +1,3 @@ - #' @title doScrubletDetect #' @name doScrubletDetect #' @description run *scrublet* doublet detection for raw expression. @@ -22,101 +21,110 @@ #' @seealso This function wraps the python package scrublet #' \doi{10.1016/j.cels.2018.11.005} #' @return if `return_gobject = FALSE`, a `data.table` cell_ID, doublet scores, -#' and classifications are returned. If `TRUE`, that information is appended into -#' the input `giotto` object's metadata and the `giotto` object is returned. +#' and classifications are returned. If `TRUE`, that information is appended +#' into the input `giotto` object's metadata and the `giotto` object is +#' returned. #' @md #' @export -doScrubletDetect = function(gobject, - feat_type = NULL, - spat_unit = 'cell', - expression_values = 'raw', - expected_doublet_rate = 0.06, - min_counts = 1, - min_cells = 1, - min_gene_variability_pctl = 85, - n_prin_comps = 30, - return_gobject = TRUE, - seed = 1234) { - - - # verify if optional package is installed - package_check(pkg_name = 'scrublet', - repository = 'pip') - - # print message with information # - message("using 'scrublet' to detect doublets. If used in published research, please cite: \n - Wolock, S. L., Lopez, R. & Klein, A. M. - Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. - Cell Syst. 8, 281-291.e9 (2019). - https://doi.org/10.1016/j.cels.2018.11.005") - - # prepare python path and scrublet_script - python_path = readGiottoInstructions(gobject, param = "python_path") - reticulate::use_python(required = T, python = python_path) - python_scrublet_function = system.file("python", "python_scrublet.py", package = 'Giotto') - reticulate::source_python(file = python_scrublet_function,convert = TRUE) - - # set seed - if (!is.null(seed)) { - seed_number = as.numeric(seed) - reticulate::py_set_seed(seed = seed_number, - disable_hash_randomization = TRUE) - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - +doScrubletDetect <- function(gobject, + feat_type = NULL, + spat_unit = "cell", + expression_values = "raw", + expected_doublet_rate = 0.06, + min_counts = 1, + min_cells = 1, + min_gene_variability_pctl = 85, + n_prin_comps = 30, + return_gobject = TRUE, + seed = 1234) { + # verify if optional package is installed + package_check( + pkg_name = "scrublet", + repository = "pip" + ) - # 1. convert input to char for python inputs that must be type int - min_counts = as.character(min_counts) - min_cells = as.character(min_cells) - min_gene_variability_pctl = as.character(min_gene_variability_pctl) - n_prin_comps = as.character(n_prin_comps) + # print message with information # + message("using 'scrublet' to detect doublets. If used in published + research, please cite: \n + Wolock, S. L., Lopez, R. & Klein, A. M. + Scrublet: Computational Identification of Cell Doublets in Single-Cell + Transcriptomic Data. Cell Syst. 8, 281-291.e9 (2019). + https://doi.org/10.1016/j.cels.2018.11.005") + + # prepare python path and scrublet_script + python_path <- readGiottoInstructions(gobject, param = "python_path") + reticulate::use_python(required = TRUE, python = python_path) + python_scrublet_function <- system.file( + "python", "python_scrublet.py", package = "Giotto") + reticulate::source_python(file = python_scrublet_function, convert = TRUE) + + # set seed + if (!is.null(seed)) { + seed_number <- as.numeric(seed) + reticulate::py_set_seed( + seed = seed_number, + disable_hash_randomization = TRUE + ) + } + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - # 2. get expression data - expr_values = get_expression_values( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = expression_values, - output = 'matrix' - ) - # input is a sparse matrix with cells as rows and genes as columns - scr_input <- to_scipy_sparse(expr_values, format = "C", transpose = TRUE) + # 1. convert input to char for python inputs that must be type int + min_counts <- as.character(min_counts) + min_cells <- as.character(min_cells) + min_gene_variability_pctl <- as.character(min_gene_variability_pctl) + n_prin_comps <- as.character(n_prin_comps) - scrublet_out <- python_scrublet( - counts_matrix = scr_input, - expected_doublet_rate = expected_doublet_rate, - min_counts = min_counts, - min_cells = min_cells, - min_gene_variability_pctl = min_gene_variability_pctl, - n_prin_comps = n_prin_comps - ) + # 2. get expression data + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = expression_values, + output = "matrix" + ) - scrublet_out <- data.table::data.table( - cell_ID = colnames(expr_values), - doublet_scores = scrublet_out[[1]], - doublet = scrublet_out[[2]] - ) + # input is a sparse matrix with cells as rows and genes as columns + scr_input <- to_scipy_sparse(expr_values, format = "C", transpose = TRUE) + scrublet_out <- python_scrublet( + counts_matrix = scr_input, + expected_doublet_rate = expected_doublet_rate, + min_counts = min_counts, + min_cells = min_cells, + min_gene_variability_pctl = min_gene_variability_pctl, + n_prin_comps = n_prin_comps + ) - if(isTRUE(return_gobject)) { - # Add to metadata - gobject <- addCellMetadata( - gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - new_metadata = scrublet_out, - by_column = TRUE, - column_cell_ID = "cell_ID" + scrublet_out <- data.table::data.table( + cell_ID = colnames(expr_values), + doublet_scores = scrublet_out[[1]], + doublet = scrublet_out[[2]] ) - return(gobject) - } - return(scrublet_out) + + if (isTRUE(return_gobject)) { + # Add to metadata + gobject <- addCellMetadata( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + new_metadata = scrublet_out, + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + return(gobject) + } + + return(scrublet_out) } diff --git a/R/spatial_clusters.R b/R/spatial_clusters.R index e4d5d2681..ee9058bf0 100644 --- a/R/spatial_clusters.R +++ b/R/spatial_clusters.R @@ -1,5 +1,3 @@ - - #' @title Remove hetero edges from igraph #' @name .igraph_remove_hetero_edges #' @description @@ -12,28 +10,27 @@ #' @md #' @return igraph #' @keywords internal -.igraph_remove_hetero_edges = function(g, clus_attr) { +.igraph_remove_hetero_edges <- function(g, clus_attr) { + clus_attr_values <- igraph::vertex_attr(g, name = clus_attr) - clus_attr_values = igraph::vertex_attr(g, name = clus_attr) + for (n in unique(clus_attr_values)) { + # find all vertices of the attribute + nv <- igraph::V(g)$name[clus_attr_values == n] - for (n in unique(clus_attr_values)) { - # find all vertices of the attribute - nv = igraph::V(g)$name[clus_attr_values == n] + # find edges that include these vertices + n_all_edges <- igraph::E(g)[.inc(igraph::V(g)[nv])] %>% + igraph::as_ids() - # find edges that include these vertices - n_all_edges = igraph::E(g)[.inc(igraph::V(g)[nv])] %>% - igraph::as_ids() + # find edges associated with only these vertices + n_internal_edges <- igraph::E(g)[nv %--% nv] %>% + igraph::as_ids() - # find edges associated with only these vertices - n_internal_edges = igraph::E(g)[nv %--% nv] %>% - igraph::as_ids() + het_edges <- n_all_edges[!n_all_edges %in% n_internal_edges] - het_edges = n_all_edges[!n_all_edges %in% n_internal_edges] + g <- igraph::delete_edges(g, edges = het_edges) + } - g = igraph::delete_edges(g, edges = het_edges) - } - - g + g } @@ -47,11 +44,11 @@ #' @param clus_name character. name to assign column of clustering info #' @return data.table #' @keywords internal -.igraph_vertex_membership = function(g, clus_name) { - membership = igraph::components(g)$membership %>% - data.table::as.data.table(keep.rownames = TRUE) - data.table::setnames(membership, c('cell_ID', clus_name)) - membership +.igraph_vertex_membership <- function(g, clus_name) { + membership <- igraph::components(g)$membership %>% + data.table::as.data.table(keep.rownames = TRUE) + data.table::setnames(membership, c("cell_ID", clus_name)) + membership } @@ -68,80 +65,82 @@ #' information to split #' @examples #' library(Giotto) -#' g = GiottoData::loadGiottoMini('vizgen') -#' activeSpatUnit(g) = 'aggregate' -#' spatPlot2D(g, cell_color = 'leiden_clus') +#' g <- GiottoData::loadGiottoMini("vizgen") +#' activeSpatUnit(g) <- "aggregate" +#' spatPlot2D(g, cell_color = "leiden_clus") #' -#' g = spatialSplitCluster(g, -#' cluster_col = 'leiden_clus', -#' split_clus_name = 'new') +#' g <- spatialSplitCluster(g, +#' cluster_col = "leiden_clus", +#' split_clus_name = "new" +#' ) #' # don't show legend since there are too many categories generated -#' spatPlot2D(g, cell_color = 'new', show_legend = FALSE) +#' spatPlot2D(g, cell_color = "new", show_legend = FALSE) #' @export -spatialSplitCluster = function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network_name = 'Delaunay_network', - cluster_col, - split_clus_name = paste0(cluster_col, '_split')) { - - # NSE vars - cell_ID = NULL - - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - cell_meta = getCellMetadata( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = FALSE - ) - - sn = getSpatialNetwork( - gobject = gobject, - spat_unit = spat_unit, - name = spatial_network_name, - output = 'spatialNetworkObj', - copy_obj = FALSE, - verbose = FALSE, - ) - - clus_info = cell_meta[, c('cell_ID', cluster_col), with = FALSE] # subset to needed cols - g = GiottoClass::spat_net_to_igraph(sn) # convert spatialNetworkObject to igraph - - # assign cluster info to igraph nodes - clus_values = clus_info[match(igraph::V(g)$name, cell_ID), get(cluster_col)] - igraph::V(g)$cluster = clus_values - - # split cluster by spatial igraph - g = .igraph_remove_hetero_edges( - g = g, - clus_attr = 'cluster' - ) - - # get new clusterings - new_clus_dt = .igraph_vertex_membership( - g = g, - clus_name = split_clus_name - ) - - gobject = addCellMetadata( - gobject, - spat_unit = spat_unit , - new_metadata = new_clus_dt, - by_column = TRUE, - column_cell_ID = 'cell_ID' - ) - - gobject +spatialSplitCluster <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + spatial_network_name = "Delaunay_network", + cluster_col, + split_clus_name = paste0(cluster_col, "_split")) { + # NSE vars + cell_ID <- NULL + + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + cell_meta <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = FALSE + ) + + sn <- getSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "spatialNetworkObj", + copy_obj = FALSE, + verbose = FALSE, + ) + + clus_info <- cell_meta[, c("cell_ID", cluster_col), with = FALSE] + # subset to needed cols + g <- GiottoClass::spat_net_to_igraph(sn) + # convert spatialNetworkObject to igraph + + # assign cluster info to igraph nodes + clus_values <- clus_info[ + match(igraph::V(g)$name, cell_ID), get(cluster_col)] + igraph::V(g)$cluster <- clus_values + + # split cluster by spatial igraph + g <- .igraph_remove_hetero_edges( + g = g, + clus_attr = "cluster" + ) + + # get new clusterings + new_clus_dt <- .igraph_vertex_membership( + g = g, + clus_name = split_clus_name + ) + + gobject <- addCellMetadata( + gobject, + spat_unit = spat_unit, + new_metadata = new_clus_dt, + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + + gobject } - - - - - diff --git a/R/spatial_enrichment.R b/R/spatial_enrichment.R index 535df741a..3e0794b78 100644 --- a/R/spatial_enrichment.R +++ b/R/spatial_enrichment.R @@ -1,45 +1,49 @@ - ## create spatial enrichment matrix #### #' @title makeSignMatrixPAGE -#' @description Function to convert a list of signature genes (e.g. for cell types or processes) into -#' a binary matrix format that can be used with the PAGE enrichment option. Each cell type or process should -#' have a vector of cell-type or process specific genes. These vectors need to be combined into a list (sign_list). -#' The names of the cell types or processes that are provided in the list need to be given (sign_names). +#' @description Function to convert a list of signature genes +#' (e.g. for cell types or processes) into +#' a binary matrix format that can be used with the PAGE enrichment option. +#' Each cell type or process should +#' have a vector of cell-type or process specific genes. These vectors need to +#' be combined into a list (sign_list). +#' The names of the cell types or processes that are provided in the list need +#' to be given (sign_names). #' @param sign_names vector with names for each provided gene signature #' @param sign_list list of genes (signature) #' @return matrix #' @seealso \code{\link{PAGEEnrich}} #' @export -makeSignMatrixPAGE = function(sign_names, - sign_list) { - - ## check input - if(!inherits(sign_list, 'list')) { - stop('\n sign_list needs to be a list of signatures for each cell type / process \n') - } - if(length(sign_names) != length(sign_list)) { - stop('\n the number of names needs to match the number of signatures provided \n') - } - - ## create genes and signatures - genes = do.call('c', sign_list) - types = lapply(seq_along(sign_names), FUN = function(x) { - - subset = sign_list[[x]] - name_subset = sign_names[[x]] - - res = rep(x = name_subset, length(subset)) +makeSignMatrixPAGE <- function(sign_names, + sign_list) { + ## check input + if (!inherits(sign_list, "list")) { + stop("sign_list needs to be a list of signatures for each cell type / + process") + } + if (length(sign_names) != length(sign_list)) { + stop("the number of names needs to match the number of signatures + provided") + } - }) - mydt = data.table::data.table(genes = genes, types = unlist(types), value = 1) + ## create genes and signatures + genes <- do.call("c", sign_list) + types <- lapply(seq_along(sign_names), FUN = function(x) { + subset <- sign_list[[x]] + name_subset <- sign_names[[x]] - # convert data.table to signature matrix - dtmatrix = data.table::dcast.data.table(mydt, formula = genes~types, value.var = 'value', fill = 0) - final_sig_matrix = Matrix::as.matrix(dtmatrix[,-1]); rownames(final_sig_matrix) = dtmatrix$genes + res <- rep(x = name_subset, length(subset)) + }) + mydt <- data.table::data.table( + genes = genes, types = unlist(types), value = 1) - return(final_sig_matrix) + # convert data.table to signature matrix + dtmatrix <- data.table::dcast.data.table( + mydt, formula = genes ~ types, value.var = "value", fill = 0) + final_sig_matrix <- Matrix::as.matrix(dtmatrix[, -1]) + rownames(final_sig_matrix) <- dtmatrix$genes + return(final_sig_matrix) } @@ -56,55 +60,56 @@ makeSignMatrixPAGE = function(sign_names, #' @return matrix #' @seealso \code{\link{runDWLSDeconv}} #' @export -makeSignMatrixDWLSfromMatrix = function(matrix, - sign_gene, - cell_type_vector) { - - - # 1. check if cell_type_vector and matrix are compatible - if(ncol(matrix) != length(cell_type_vector)) { - stop('ncol(matrix) needs to be the same as length(cell_type_vector)') - } - - # check input for sign_gene - if(!is.character(sign_gene)) { - stop('\n sign_gene needs to be a character vector of cell type specific genes \n') - } - +makeSignMatrixDWLSfromMatrix <- function(matrix, + sign_gene, + cell_type_vector) { + # 1. check if cell_type_vector and matrix are compatible + if (ncol(matrix) != length(cell_type_vector)) { + stop("ncol(matrix) needs to be the same as length(cell_type_vector)") + } - # 2. get the common genes from the matrix and vector of signature genes - intersect_sign_gene = intersect(rownames(matrix), sign_gene) - matrix_subset = matrix[intersect_sign_gene, ] + # check input for sign_gene + if (!is.character(sign_gene)) { + stop("sign_gene needs to be a character vector of cell type specific + genes") + } - # 3. for each cell type - # calculate average expression for all signature genes - signMatrix = matrix(data = NA, - nrow = nrow(matrix_subset), - ncol = length(unique(cell_type_vector))) + # 2. get the common genes from the matrix and vector of signature genes + intersect_sign_gene <- intersect(rownames(matrix), sign_gene) + matrix_subset <- matrix[intersect_sign_gene, ] - for(cell_type_i in seq_along(unique(cell_type_vector))) { - cell_type = unique(cell_type_vector)[cell_type_i] - selected_cells = colnames(matrix_subset)[cell_type_vector == cell_type] - mean_expr_in_selected_cells = rowMeans_flex(matrix_subset[, selected_cells]) + # 3. for each cell type + # calculate average expression for all signature genes + signMatrix <- matrix( + data = NA, + nrow = nrow(matrix_subset), + ncol = length(unique(cell_type_vector)) + ) - signMatrix[, cell_type_i] = mean_expr_in_selected_cells - } + for (cell_type_i in seq_along(unique(cell_type_vector))) { + cell_type <- unique(cell_type_vector)[cell_type_i] + selected_cells <- colnames(matrix_subset)[cell_type_vector == cell_type] + mean_expr_in_selected_cells <- rowMeans_flex(matrix_subset[ + , selected_cells]) - rownames(signMatrix) = rownames(matrix_subset) - colnames(signMatrix) = unique(cell_type_vector) + signMatrix[, cell_type_i] <- mean_expr_in_selected_cells + } - return(signMatrix) + rownames(signMatrix) <- rownames(matrix_subset) + colnames(signMatrix) <- unique(cell_type_vector) + return(signMatrix) } #' @title makeSignMatrixDWLS -#' @description Function to convert a matrix within a Giotto object into a format -#' that can be used with \code{\link{runDWLSDeconv}} for deconvolution. A vector of cell types -#' for parameter \code{cell_type_vector} can be created from the cell metadata (\code{\link{pDataDT}}). +#' @description Function to convert a matrix within a Giotto object into a +#' format that can be used with \code{\link{runDWLSDeconv}} for deconvolution. +#' A vector of cell types for parameter \code{cell_type_vector} can be created +#' from the cell metadata (\code{\link{pDataDT}}). #' @param gobject Giotto object of single cell #' @param spat_unit spatial unit #' @param feat_type feature type to use @@ -117,50 +122,58 @@ makeSignMatrixDWLSfromMatrix = function(matrix, #' @return matrix #' @seealso \code{\link{runDWLSDeconv}} #' @export -makeSignMatrixDWLS = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - reverse_log = TRUE, - log_base = 2, - sign_gene, - cell_type_vector, - cell_type = NULL) { - - - ## deprecated arguments - if(!is.null(cell_type)) { - warning('\n cell_type is deprecated, use cell_type_vector in the future \n') - cell_type_vector = cell_type - } - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## 1. expression matrix - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - ## 2. reverse log-normalization - if(reverse_log == TRUE) { - expr_values[] = log_base^(expr_values[])-1 - } - - ## 3. run signature matrix function - res = makeSignMatrixDWLSfromMatrix(matrix = expr_values[], - sign_gene = sign_gene, - cell_type_vector = cell_type_vector) - - return(res) +makeSignMatrixDWLS <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + reverse_log = TRUE, + log_base = 2, + sign_gene, + cell_type_vector, + cell_type = NULL) { + ## deprecated arguments + if (!is.null(cell_type)) { + warning("cell_type is deprecated, use cell_type_vector in the future") + cell_type_vector <- cell_type + } + + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + ## 1. expression matrix + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- get_expression_values( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + ## 2. reverse log-normalization + if (reverse_log == TRUE) { + expr_values[] <- log_base^(expr_values[]) - 1 + } + + ## 3. run signature matrix function + res <- makeSignMatrixDWLSfromMatrix( + matrix = expr_values[], + sign_gene = sign_gene, + cell_type_vector = cell_type_vector + ) + + return(res) } @@ -174,82 +187,79 @@ makeSignMatrixDWLS = function(gobject, #' @param sc_matrix matrix of single-cell RNAseq expression data #' @param sc_cluster_ids vector of cluster ids #' @param ties_method how to handle rank ties -#' @param gobject if giotto object is given then only genes present in both datasets will be considered +#' @param gobject if giotto object is given then only genes present in both +#' datasets will be considered #' @return matrix #' @seealso \code{\link{rankEnrich}} #' @export -makeSignMatrixRank = function(sc_matrix, - sc_cluster_ids, - ties_method = c("random", "max"), - gobject = NULL) { - - if(inherits(sc_matrix, 'exprObj')) { - sc_matrix = sc_matrix[] - } - if (!inherits(sc_matrix, c("matrix", "Matrix", "DelayedArray"))) { - stop("'sc_matrix' must be a matrix input") - } - - # select ties_method - ties_method = match.arg(ties_method, choices = c("random", "max")) - - # check input - if(length(sc_cluster_ids) != ncol(sc_matrix)) { - stop('Number of columns of the scRNAseq matrix needs to have the same length as the cluster ids') - } - - mean_list = list() - group_list = list() - total_nr_genes = nrow(sc_matrix) - - # calculate means for each cluster group - for(group in seq_along(unique(sc_cluster_ids))) { - - group_id = unique(sc_cluster_ids)[group] - cell_ind = which(sc_cluster_ids == group_id) - cluster_rowmeans = rowMeans_flex(sc_matrix[,cell_ind]) - mean_list[[group_id]] = cluster_rowmeans - group_list[[group]] = rep(group_id, total_nr_genes) - } - - mean_list_res = data.table::as.data.table(do.call('c', mean_list)) - group_list_res = data.table::as.data.table(do.call('c', group_list)) - - # average expression for all cells - av_expression = rowMeans_flex(sc_matrix) - av_expression_res = rep(av_expression, length(unique(sc_cluster_ids))) - - gene_names = rownames(sc_matrix) - gene_names_res = rep(gene_names, length(unique(sc_cluster_ids))) - - # create data.table with genes, mean expression per cluster, mean expression overall and cluster ids - comb_dt = data.table::data.table( - genes = gene_names_res, - mean_expr = mean_list_res[[1]], - av_expr = av_expression_res, - clusters = group_list_res[[1]] - ) - - # data.table variables - fold = mean_expr = av_expr = rankFold = clusters = NULL - - # calculate fold change and rank of fold-change - comb_dt[, fold := log2(mean_expr+1)-log2(av_expr+1)] - comb_dt[, rankFold := data.table::frank(-fold, ties.method = ties_method), by = clusters] - - # create matrix - comb_rank_mat = data.table::dcast.data.table(data = comb_dt, genes~clusters, value.var = 'rankFold') - comb_rank_matrix = dt_to_matrix(comb_rank_mat) - comb_rank_matrix = comb_rank_matrix[rownames(sc_matrix), unique(sc_cluster_ids)] - - - # # intersect rank matrix with giotto object if given - # if(!is.null(gobject) & class(gobject) %in% c('giotto')) { - # comb_rank_matrix = comb_rank_matrix[intersect(rownames(comb_rank_matrix), gobject@gene_ID), ] - # } - - return(comb_rank_matrix) +makeSignMatrixRank <- function(sc_matrix, + sc_cluster_ids, + ties_method = c("random", "max"), + gobject = NULL) { + if (inherits(sc_matrix, "exprObj")) { + sc_matrix <- sc_matrix[] + } + if (!inherits(sc_matrix, c("matrix", "Matrix", "DelayedArray"))) { + stop("'sc_matrix' must be a matrix input") + } + + # select ties_method + ties_method <- match.arg(ties_method, choices = c("random", "max")) + + # check input + if (length(sc_cluster_ids) != ncol(sc_matrix)) { + stop("Number of columns of the scRNAseq matrix needs to have the same + length as the cluster ids") + } + + mean_list <- list() + group_list <- list() + total_nr_genes <- nrow(sc_matrix) + + # calculate means for each cluster group + for (group in seq_along(unique(sc_cluster_ids))) { + group_id <- unique(sc_cluster_ids)[group] + cell_ind <- which(sc_cluster_ids == group_id) + cluster_rowmeans <- rowMeans_flex(sc_matrix[, cell_ind]) + mean_list[[group_id]] <- cluster_rowmeans + group_list[[group]] <- rep(group_id, total_nr_genes) + } + mean_list_res <- data.table::as.data.table(do.call("c", mean_list)) + group_list_res <- data.table::as.data.table(do.call("c", group_list)) + + # average expression for all cells + av_expression <- rowMeans_flex(sc_matrix) + av_expression_res <- rep(av_expression, length(unique(sc_cluster_ids))) + + gene_names <- rownames(sc_matrix) + gene_names_res <- rep(gene_names, length(unique(sc_cluster_ids))) + + # create data.table with genes, mean expression per cluster, mean + # expression overall and cluster ids + comb_dt <- data.table::data.table( + genes = gene_names_res, + mean_expr = mean_list_res[[1]], + av_expr = av_expression_res, + clusters = group_list_res[[1]] + ) + + # data.table variables + fold <- mean_expr <- av_expr <- rankFold <- clusters <- NULL + + # calculate fold change and rank of fold-change + comb_dt[, fold := log2(mean_expr + 1) - log2(av_expr + 1)] + comb_dt[, rankFold := data.table::frank( + -fold, ties.method = ties_method), by = clusters] + + # create matrix + comb_rank_mat <- data.table::dcast.data.table( + data = comb_dt, genes ~ clusters, value.var = "rankFold") + comb_rank_matrix <- dt_to_matrix(comb_rank_mat) + comb_rank_matrix <- comb_rank_matrix[ + rownames(sc_matrix), unique(sc_cluster_ids)] + + return(comb_rank_matrix) } @@ -261,63 +271,72 @@ makeSignMatrixRank = function(sc_matrix, #' @title PAGE permutation #' @description creates permutation for the PAGEEnrich test #' @keywords internal -.do_page_permutation = function(gobject, - sig_gene, - ntimes){ - # check available gene - available_ct<-c() - for (i in colnames(sig_gene)){ - gene_i=rownames(sig_gene)[which(sig_gene[,i]==1)] - overlap_i=intersect(gene_i,rownames(gobject@expression$rna$normalized)) - if (length(overlap_i)<=5){ - output<-paste0("Warning, ",i," only has ",length(overlap_i)," overlapped genes. Will remove it.") - - } else { - available_ct<-c(available_ct,i) - } - } - if (length(available_ct)==1){ - stop("Only one cell type available.") - } - # only continue with genes present in both datasets - interGene = intersect(rownames(sig_gene), rownames(gobject@expression$rna$normalized)) - sign_matrix = sig_gene[interGene,available_ct] - - ct_gene_counts<-NULL - for (i in 1:dim(sign_matrix)[2]){ - a<-length(which(sign_matrix[,i]==1)) - ct_gene_counts = c(ct_gene_counts,a) - } - uniq_ct_gene_counts = unique(ct_gene_counts) - background_mean_sd = matrix(data=NA,nrow = length(uniq_ct_gene_counts)+1, ncol = 3) - for (i in seq_along(uniq_ct_gene_counts)){ - gene_num<-uniq_ct_gene_counts[i] - all_sample_names<-NULL - all_sample_list<-NULL - for (j in 1:ntimes){ - set.seed(j) - random_gene = sample(rownames(gobject@expression$rna$normalized),gene_num,replace=FALSE) - ct_name = paste("ct",j,sep="") - all_sample_names = c(all_sample_names,ct_name) - all_sample_list = c(all_sample_list,list(random_gene)) - } - random_sig = makeSignMatrixPAGE(all_sample_names,all_sample_list) - random_DT = runPAGEEnrich(gobject, sign_matrix = random_sig, p_value = F) - background = unlist(random_DT[,2:dim(random_DT)[2]]) - df_row_name = paste("gene_num_",uniq_ct_gene_counts[i],sep="") - list_back_i = c(df_row_name,mean(background), stats::sd(background)) - background_mean_sd[i,] = list_back_i - } - background_mean_sd[length(uniq_ct_gene_counts)+1,] = c("temp","0","1") - df_back = data.frame(background_mean_sd,row.names = 1) - colnames(df_back) = c("mean","sd") - return(df_back) +.do_page_permutation <- function(gobject, + sig_gene, + ntimes) { + # check available gene + available_ct <- c() + for (i in colnames(sig_gene)) { + gene_i <- rownames(sig_gene)[which(sig_gene[, i] == 1)] + overlap_i <- intersect( + gene_i, rownames(gobject@expression$rna$normalized)) + if (length(overlap_i) <= 5) { + output <- paste0( + "Warning, ", i, " only has ", length(overlap_i), + " overlapped genes. Will remove it.") + } else { + available_ct <- c(available_ct, i) + } + } + if (length(available_ct) == 1) { + stop("Only one cell type available.") + } + # only continue with genes present in both datasets + interGene <- intersect( + rownames(sig_gene), rownames(gobject@expression$rna$normalized)) + sign_matrix <- sig_gene[interGene, available_ct] + + ct_gene_counts <- NULL + for (i in 1:dim(sign_matrix)[2]) { + a <- length(which(sign_matrix[, i] == 1)) + ct_gene_counts <- c(ct_gene_counts, a) + } + uniq_ct_gene_counts <- unique(ct_gene_counts) + background_mean_sd <- matrix( + data = NA, nrow = length(uniq_ct_gene_counts) + 1, ncol = 3) + for (i in seq_along(uniq_ct_gene_counts)) { + gene_num <- uniq_ct_gene_counts[i] + all_sample_names <- NULL + all_sample_list <- NULL + for (j in 1:ntimes) { + set.seed(j) + random_gene <- sample(rownames( + gobject@expression$rna$normalized), gene_num, replace = FALSE) + ct_name <- paste("ct", j, sep = "") + all_sample_names <- c(all_sample_names, ct_name) + all_sample_list <- c(all_sample_list, list(random_gene)) + } + random_sig <- makeSignMatrixPAGE(all_sample_names, all_sample_list) + random_DT <- runPAGEEnrich( + gobject, + sign_matrix = random_sig, + p_value = FALSE) + background <- unlist(random_DT[, 2:dim(random_DT)[2]]) + df_row_name <- paste("gene_num_", uniq_ct_gene_counts[i], sep = "") + list_back_i <- c(df_row_name, mean(background), stats::sd(background)) + background_mean_sd[i, ] <- list_back_i + } + background_mean_sd[length(uniq_ct_gene_counts) + 1, ] <- c("temp", "0", "1") + df_back <- data.frame(background_mean_sd, row.names = 1) + colnames(df_back) <- c("mean", "sd") + return(df_back) } #' @title runPAGEEnrich_OLD -#' @description Function to calculate gene signature enrichment scores per spatial position using PAGE. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using PAGE. #' @param gobject Giotto object #' @param sign_matrix Matrix of signature genes for each cell type / process #' @param expression_values expression values to use @@ -330,177 +349,189 @@ makeSignMatrixRank = function(sc_matrix, #' @param return_gobject return giotto object #' @return data.table with enrichment results #' @details -#' sign_matrix: a binary matrix with genes as row names and cell-types as column names. -#' Alternatively a list of signature genes can be provided to makeSignMatrixPAGE, which will create -#' the matrix for you. \cr +#' sign_matrix: a binary matrix with genes as row names and cell-types as +#' column names. +#' Alternatively a list of signature genes can be provided to +#' makeSignMatrixPAGE, which will create the matrix for you. \cr #' #' The enrichment Z score is calculated by using method (PAGE) from -#' Kim SY et al., BMC bioinformatics, 2005 as \eqn{Z = ((Sm – mu)*m^(1/2)) / delta}. -#' For each gene in each spot, mu is the fold change values versus the mean expression -#' and delta is the standard deviation. Sm is the mean fold change value of a specific marker gene set -#' and m is the size of a given marker gene set. +#' Kim SY et al., BMC bioinformatics, 2005 as +#' \eqn{Z = ((Sm – mu)*m^(1/2)) / delta}. +#' For each gene in each spot, mu is the fold change values versus the mean +#' expression and delta is the standard deviation. Sm is the mean fold change +#' value of a specific marker gene set and m is the size of a given marker +#' gene set. #' @seealso \code{\link{makeSignMatrixPAGE}} #' @export runPAGEEnrich_OLD <- function(gobject, - sign_matrix, - expression_values = c('normalized', 'scaled', 'custom'), - reverse_log_scale = TRUE, - logbase = 2, - output_enrichment = c('original', 'zscore'), - p_value = FALSE, - n_times = 1000, - name = NULL, - return_gobject = TRUE) { - - - # expression values to be used - values = match.arg(expression_values, c('normalized', 'scaled', 'custom')) - expr_values = get_expression_values(gobject = gobject, values = values) - - # check parameters - if(is.null(name)) name = 'PAGE' - - # check available gene - available_ct<-c() - for (i in colnames(sign_matrix)){ - gene_i=rownames(sign_matrix)[which(sign_matrix[,i]==1)] - overlap_i=intersect(gene_i,rownames(expr_values)) - if (length(overlap_i)<=5){ - output<-paste0("Warning, ",i," only has ",length(overlap_i)," overlapped genes. Will remove it.") + sign_matrix, + expression_values = c("normalized", "scaled", "custom"), + reverse_log_scale = TRUE, + logbase = 2, + output_enrichment = c("original", "zscore"), + p_value = FALSE, + n_times = 1000, + name = NULL, + return_gobject = TRUE) { + # expression values to be used + values <- match.arg(expression_values, c("normalized", "scaled", "custom")) + expr_values <- get_expression_values(gobject = gobject, values = values) + + # check parameters + if (is.null(name)) name <- "PAGE" - } else { - available_ct<-c(available_ct,i) + # check available gene + available_ct <- c() + for (i in colnames(sign_matrix)) { + gene_i <- rownames(sign_matrix)[which(sign_matrix[, i] == 1)] + overlap_i <- intersect(gene_i, rownames(expr_values)) + if (length(overlap_i) <= 5) { + output <- paste0( + "Warning, ", i, " only has ", length(overlap_i), + " overlapped genes. Will remove it.") + } else { + available_ct <- c(available_ct, i) + } } - } - if (length(available_ct)==1){ - stop("Only one cell type available.") - } - - # output enrichment - output_enrichment = match.arg(output_enrichment, choices = c('original', 'zscore')) - - # only continue with genes present in both datasets - interGene = intersect(rownames(sign_matrix), rownames(expr_values)) - filterSig = sign_matrix[interGene, available_ct] - signames = rownames(filterSig)[which(filterSig[,1]==1)] - - # calculate mean gene expression - if(reverse_log_scale == TRUE) { - mean_gene_expr = log(rowMeans(logbase^expr_values-1, dims = 1)+1) - } else { - mean_gene_expr = rowMeans(expr_values) - } - geneFold = expr_values - mean_gene_expr - - # calculate sample/spot mean and sd - cellColMean = apply(geneFold,2,mean) - cellColSd = apply(geneFold,2,stats::sd) - - # get enrichment scores - enrichment = matrix(data=NA,nrow = dim(filterSig)[2],ncol=length(cellColMean)) - for (i in (1:dim(filterSig)[2])){ - signames = rownames(filterSig)[which(filterSig[,i]==1)] - sigColMean = apply(geneFold[signames,],2,mean) - m = length(signames) - vectorX = NULL - for (j in(seq_along(cellColMean))){ - Sm = sigColMean[j] - u = cellColMean[j] - sigma = cellColSd[j] - zscore = (Sm - u)* m^(1/2) / sigma - vectorX = append(vectorX,zscore) + if (length(available_ct) == 1) { + stop("Only one cell type available.") } - enrichment[i,] = vectorX - } - - rownames(enrichment) = colnames(filterSig) - colnames(enrichment) = names(cellColMean) - enrichment = t(enrichment) - if(output_enrichment == 'zscore') { - enrichment = scale(enrichment) - } + # output enrichment + output_enrichment <- match.arg( + output_enrichment, choices = c("original", "zscore")) - enrichmentDT = data.table::data.table(cell_ID = rownames(enrichment)) - enrichmentDT = cbind(enrichmentDT, data.table::as.data.table(enrichment)) - - - - ## calculate p-values if requested - if (p_value==TRUE){ - - # check available gene - available_ct = c() - for (i in colnames(sign_matrix)){ - gene_i = rownames(sign_matrix)[which(sign_matrix[,i]==1)] - overlap_i = intersect(gene_i,rownames(gobject@expression$rna$normalized)) - - if (length(overlap_i)<=5){ - output = paste0("Warning, ",i," only has ",length(overlap_i)," overlapped genes. It will be removed.") + # only continue with genes present in both datasets + interGene <- intersect(rownames(sign_matrix), rownames(expr_values)) + filterSig <- sign_matrix[interGene, available_ct] + signames <- rownames(filterSig)[which(filterSig[, 1] == 1)] - } else { - available_ct = c(available_ct, i) - } + # calculate mean gene expression + if (reverse_log_scale == TRUE) { + mean_gene_expr <- log(rowMeans(logbase^expr_values - 1, dims = 1) + 1) + } else { + mean_gene_expr <- rowMeans(expr_values) } - - if (length(available_ct) == 1){ - stop("Only one cell type available.") + geneFold <- expr_values - mean_gene_expr + + # calculate sample/spot mean and sd + cellColMean <- apply(geneFold, 2, mean) + cellColSd <- apply(geneFold, 2, stats::sd) + + # get enrichment scores + enrichment <- matrix( + data = NA, nrow = dim(filterSig)[2], ncol = length(cellColMean)) + for (i in (1:dim(filterSig)[2])) { + signames <- rownames(filterSig)[which(filterSig[, i] == 1)] + sigColMean <- apply(geneFold[signames, ], 2, mean) + m <- length(signames) + vectorX <- NULL + for (j in (seq_along(cellColMean))) { + Sm <- sigColMean[j] + u <- cellColMean[j] + sigma <- cellColSd[j] + zscore <- (Sm - u) * m^(1 / 2) / sigma + vectorX <- append(vectorX, zscore) + } + enrichment[i, ] <- vectorX } - # only continue with genes present in both datasets - interGene = intersect(rownames(sign_matrix), rownames(gobject@expression$rna$normalized)) - filter_sign_matrix = sign_matrix[interGene,available_ct] + rownames(enrichment) <- colnames(filterSig) + colnames(enrichment) <- names(cellColMean) + enrichment <- t(enrichment) - background_mean_sd = .do_page_permutation(gobject = gobject, - sig_gene = filter_sign_matrix, - ntimes = n_times) - - for (i in 1:dim(filter_sign_matrix)[2]){ - length_gene = length(which(filter_sign_matrix[,i] == 1)) - join_gene_with_length = paste("gene_num_", length_gene, sep = "") - mean_i = as.numeric(as.character(background_mean_sd[join_gene_with_length,][[1]])) - sd_i = as.numeric(as.character(background_mean_sd[join_gene_with_length,][[2]])) - j = i+1 - enrichmentDT[[j]] = stats::pnorm(enrichmentDT[[j]], mean = mean_i, sd = sd_i, lower.tail = FALSE, log.p = FALSE) + if (output_enrichment == "zscore") { + enrichment <- scale(enrichment) } - } + + enrichmentDT <- data.table::data.table(cell_ID = rownames(enrichment)) + enrichmentDT <- cbind(enrichmentDT, data.table::as.data.table(enrichment)) - ## return object or results ## - if(return_gobject == TRUE) { + ## calculate p-values if requested + if (p_value == TRUE) { + # check available gene + available_ct <- c() + for (i in colnames(sign_matrix)) { + gene_i <- rownames(sign_matrix)[which(sign_matrix[, i] == 1)] + overlap_i <- intersect( + gene_i, rownames(gobject@expression$rna$normalized)) - spenr_names = names(gobject@spatial_enrichment) + if (length(overlap_i) <= 5) { + output <- paste0( + "Warning, ", i, " only has ", length(overlap_i), + " overlapped genes. It will be removed.") + } else { + available_ct <- c(available_ct, i) + } + } + + if (length(available_ct) == 1) { + stop("Only one cell type available.") + } - if(name %in% spenr_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') + # only continue with genes present in both datasets + interGene <- intersect( + rownames(sign_matrix), rownames(gobject@expression$rna$normalized)) + filter_sign_matrix <- sign_matrix[interGene, available_ct] + + background_mean_sd <- .do_page_permutation( + gobject = gobject, + sig_gene = filter_sign_matrix, + ntimes = n_times + ) + + for (i in 1:dim(filter_sign_matrix)[2]) { + length_gene <- length(which(filter_sign_matrix[, i] == 1)) + join_gene_with_length <- paste("gene_num_", length_gene, sep = "") + mean_i <- as.numeric(as.character( + background_mean_sd[join_gene_with_length, ][[1]])) + sd_i <- as.numeric(as.character( + background_mean_sd[join_gene_with_length, ][[2]])) + j <- i + 1 + enrichmentDT[[j]] <- stats::pnorm( + enrichmentDT[[j]], mean = mean_i, sd = sd_i, + lower.tail = FALSE, log.p = FALSE) + } } - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_spatial_enrichment') - # parameters to include - parameters_list[[update_name]] = c('method used' = 'PAGE', - 'enrichment name' = name, - 'expression values' = expression_values, - 'reverse log scale' = reverse_log_scale, - 'logbase' = logbase, - 'p-values calculated' = p_value, - 'output enrichment scores' = output_enrichment, - 'p values calculated' = p_value, - 'nr permutations' = n_times) - gobject@parameters = parameters_list - gobject@spatial_enrichment[[name]] = enrichmentDT + ## return object or results ## + if (return_gobject == TRUE) { + spenr_names <- names(gobject@spatial_enrichment) - return(gobject) + if (name %in% spenr_names) { + cat(name, " has already been used, will be overwritten") + } - } else { - return(enrichmentDT) - } + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_spatial_enrichment") + + # parameters to include + parameters_list[[update_name]] <- c( + "method used" = "PAGE", + "enrichment name" = name, + "expression values" = expression_values, + "reverse log scale" = reverse_log_scale, + "logbase" = logbase, + "p-values calculated" = p_value, + "output enrichment scores" = output_enrichment, + "p values calculated" = p_value, + "nr permutations" = n_times + ) + gobject@parameters <- parameters_list + + gobject@spatial_enrichment[[name]] <- enrichmentDT + + return(gobject) + } else { + return(enrichmentDT) + } } @@ -508,213 +539,231 @@ runPAGEEnrich_OLD <- function(gobject, #' @title PAGE data.table method #' @param expr_values matrix of expression values #' @keywords internal -.page_dt_method = function(sign_matrix, - expr_values, - min_overlap_genes = 5, - logbase = 2, - reverse_log_scale = TRUE, - output_enrichment = c('original', 'zscore'), - p_value = FALSE, - include_depletion = FALSE, - n_times = 1000, - max_block = 20e6, - verbose = TRUE) { - - - # data.table variables - Var1 = value = Var2 = V1 = marker = nr_markers = fc = cell_ID = zscore = colmean = colSd = pval = NULL - mean_zscore = sd_zscore = pval_score = NULL - - # output enrichment - output_enrichment = match.arg(output_enrichment, choices = c('original', 'zscore')) - - ## identify available cell types - all_genes = rownames(expr_values) - sign_matrix = as.matrix(sign_matrix) - sign_matrix_DT = data.table::as.data.table(reshape2::melt(sign_matrix)) - sign_matrix_DT = sign_matrix_DT[Var1 %in% all_genes] - detected_DT = sign_matrix_DT[, sum(value), by = Var2] - - lost_cell_types_DT = detected_DT[V1 <= min_overlap_genes] - if(nrow(lost_cell_types_DT) > 0) { - for(row in 1:nrow(lost_cell_types_DT)) { - output = paste0("Warning, ",lost_cell_types_DT[row][['Var2']]," only has ",lost_cell_types_DT[row][['V1']]," overlapping genes. Will be removed.") - if(verbose) print(output) - } - } - available_ct = as.character(detected_DT[V1 > min_overlap_genes][['Var2']]) - - if (length(available_ct) == 1){ - stop("Only one cell type available.") - } - - # create subset of sinature matrix - interGene = intersect(rownames(sign_matrix), rownames(expr_values)) - filterSig = sign_matrix[interGene, available_ct] - - # create fold expression for each gene in each spot - # calculate mean gene expression - if(reverse_log_scale == TRUE) { - mean_gene_expr = log(rowMeans(logbase^expr_values-1, dims = 1)+1) - } else { - mean_gene_expr = rowMeans(expr_values) - } - geneFold = expr_values - mean_gene_expr - - # calculate sample/spot mean and sd - cellColMean = colMeans(geneFold) - cellColSd = apply(geneFold, 2, stats::sd) - cellColMeanSd = data.table::data.table(cell_ID = names(cellColMean), - colmean = cellColMean, - colSd = cellColSd) - - filterSig_DT = data.table::as.data.table(reshape2::melt(filterSig)) - colnames(filterSig_DT) = c('gene', 'cell_type', 'marker') - sub_ct_DT = filterSig_DT[marker == 1] - sub_ct_DT[, nr_markers := .N, by = cell_type] - - ## reshape gene fold-expression - geneFold_DT = data.table::as.data.table(reshape2::melt(geneFold)) - colnames(geneFold_DT) = c('gene', 'cell_ID', 'fc') - - mergetest = data.table::merge.data.table(sub_ct_DT, geneFold_DT, by = 'gene') - mergetest = mergetest[, mean(fc), by = .(cell_type, cell_ID, nr_markers)] - if (is.integer(mergetest$cell_ID) && is.character(cellColMeanSd$cell_ID)){ - mergetest$cell_ID = as.character(mergetest$cell_ID) - } - mergetest = data.table::merge.data.table(mergetest, cellColMeanSd, by = 'cell_ID') - mergetest[, zscore := ((V1 - colmean)* nr_markers^(1/2)) / colSd] - - if(output_enrichment == 'zscore') { - mergetest[, zscore := scale(zscore), by = 'cell_type'] - } - - - - - ## return p-values based on permutations ## - if(p_value == TRUE) { - - ## 1. get number of markers instructions ## - sample_intrs = unique(sub_ct_DT[,.(cell_type, nr_markers)]) - +.page_dt_method <- function(sign_matrix, + expr_values, + min_overlap_genes = 5, + logbase = 2, + reverse_log_scale = TRUE, + output_enrichment = c("original", "zscore"), + p_value = FALSE, + include_depletion = FALSE, + n_times = 1000, + max_block = 20e6, + verbose = TRUE) { + # data.table variables + Var1 <- value <- Var2 <- V1 <- marker <- nr_markers <- fc <- cell_ID <- + zscore <- colmean <- colSd <- pval <- NULL + mean_zscore <- sd_zscore <- pval_score <- NULL + + # output enrichment + output_enrichment <- match.arg( + output_enrichment, choices = c("original", "zscore")) + + ## identify available cell types + all_genes <- rownames(expr_values) + sign_matrix <- as.matrix(sign_matrix) + sign_matrix_DT <- data.table::as.data.table(reshape2::melt(sign_matrix)) + sign_matrix_DT <- sign_matrix_DT[Var1 %in% all_genes] + detected_DT <- sign_matrix_DT[, sum(value), by = Var2] + + lost_cell_types_DT <- detected_DT[V1 <= min_overlap_genes] + if (nrow(lost_cell_types_DT) > 0) { + for (row in 1:nrow(lost_cell_types_DT)) { + output <- paste0( + "Warning, ", lost_cell_types_DT[row][["Var2"]], " only has ", + lost_cell_types_DT[row][["V1"]], + " overlapping genes. Will be removed.") + if (verbose) print(output) + } + } + available_ct <- as.character(detected_DT[V1 > min_overlap_genes][["Var2"]]) - ## 2. first create the random samples all together ## - cell_type_list = list() - perm_type_list = list() - for(row in 1:nrow(sample_intrs)) { + if (length(available_ct) == 1) { + stop("Only one cell type available.") + } - cell_type = sample_intrs[row][['cell_type']] - nr_genes = as.numeric(sample_intrs[row][['nr_markers']]) + # create subset of sinature matrix + interGene <- intersect(rownames(sign_matrix), rownames(expr_values)) + filterSig <- sign_matrix[interGene, available_ct] - gene_list = list() - perm_list = list() - for(i in 1:n_times) { - sampled_genes = sample(rownames(expr_values), size = nr_genes) - gene_list[[i]] = sampled_genes - perm_list[[i]] = rep(paste0('p_',i), nr_genes) - } + # create fold expression for each gene in each spot + # calculate mean gene expression + if (reverse_log_scale == TRUE) { + mean_gene_expr <- log(rowMeans(logbase^expr_values - 1, dims = 1) + 1) + } else { + mean_gene_expr <- rowMeans(expr_values) + } + geneFold <- expr_values - mean_gene_expr + + # calculate sample/spot mean and sd + cellColMean <- colMeans(geneFold) + cellColSd <- apply(geneFold, 2, stats::sd) + cellColMeanSd <- data.table::data.table( + cell_ID = names(cellColMean), + colmean = cellColMean, + colSd = cellColSd + ) - gene_res = unlist(gene_list) - names(gene_res) = rep(cell_type, length(gene_res)) - cell_type_list[[row]] = gene_res + filterSig_DT <- data.table::as.data.table(reshape2::melt(filterSig)) + colnames(filterSig_DT) <- c("gene", "cell_type", "marker") + sub_ct_DT <- filterSig_DT[marker == 1] + sub_ct_DT[, nr_markers := .N, by = cell_type] - perm_res = unlist(perm_list) - perm_type_list[[row]] = perm_res + ## reshape gene fold-expression + geneFold_DT <- data.table::as.data.table(reshape2::melt(geneFold)) + colnames(geneFold_DT) <- c("gene", "cell_ID", "fc") + mergetest <- data.table::merge.data.table( + sub_ct_DT, geneFold_DT, by = "gene") + mergetest <- mergetest[, mean(fc), by = .(cell_type, cell_ID, nr_markers)] + if (is.integer(mergetest$cell_ID) && is.character(cellColMeanSd$cell_ID)) { + mergetest$cell_ID <- as.character(mergetest$cell_ID) } + mergetest <- data.table::merge.data.table( + mergetest, cellColMeanSd, by = "cell_ID") + mergetest[, zscore := ((V1 - colmean) * nr_markers^(1 / 2)) / colSd] - cell_type_perm = unlist(cell_type_list) - perm_round = unlist(perm_type_list) + if (output_enrichment == "zscore") { + mergetest[, zscore := scale(zscore), by = "cell_type"] + } - cell_type_perm_DT = data.table::data.table(cell_type = names(cell_type_perm), - gene = cell_type_perm, - round = perm_round) - sample_intrs_vec = sample_intrs$nr_markers - names(sample_intrs_vec) = sample_intrs$cell_type - cell_type_perm_DT[, nr_markers := sample_intrs_vec[cell_type]] - ## 3. decide on number of blocks to process ## - nr_perm_lines = as.numeric(nrow(cell_type_perm_DT)) - nr_spots = as.numeric(ncol(expr_values)) - total_lines = nr_spots * nr_perm_lines - nr_groups = round(total_lines / max_block) + ## return p-values based on permutations ## + if (p_value == TRUE) { + ## 1. get number of markers instructions ## + sample_intrs <- unique(sub_ct_DT[, .(cell_type, nr_markers)]) - ## 4. create groups - all_perms = unique(perm_round) - all_perms_num = seq_along(all_perms) - names(all_perms_num) = all_perms - group_labels = paste0('group_',1:nr_groups) - groups_vec = cut(all_perms_num, breaks = nr_groups, labels = group_labels) - names(all_perms) = groups_vec + ## 2. first create the random samples all together ## + cell_type_list <- list() + perm_type_list <- list() + for (row in 1:nrow(sample_intrs)) { + cell_type <- sample_intrs[row][["cell_type"]] + nr_genes <- as.numeric(sample_intrs[row][["nr_markers"]]) - ## 5. do random enrichment per block - res_list = list() - for(group_i in seq_along(group_labels)) { + gene_list <- list() + perm_list <- list() + for (i in 1:n_times) { + sampled_genes <- sample(rownames(expr_values), size = nr_genes) + gene_list[[i]] <- sampled_genes + perm_list[[i]] <- rep(paste0("p_", i), nr_genes) + } - group = group_labels[group_i] - sub_perms = all_perms[names(all_perms) == group] - cell_type_perm_DT_sub = cell_type_perm_DT[round %in% sub_perms] + gene_res <- unlist(gene_list) + names(gene_res) <- rep(cell_type, length(gene_res)) + cell_type_list[[row]] <- gene_res - mergetest_perm_sub = data.table::merge.data.table(cell_type_perm_DT_sub, geneFold_DT, allow.cartesian = TRUE) - mergetest_perm_sub = mergetest_perm_sub[, mean(fc), by = .(cell_type, cell_ID, nr_markers, round)] - if (is.integer(mergetest_perm_sub$cell_ID) && is.character(cellColMeanSd$cell_ID)){ - mergetest_perm_sub$cell_ID = as.character(mergetest_perm_sub$cell_ID) - } - mergetest_perm_sub = data.table::merge.data.table(mergetest_perm_sub, cellColMeanSd, by = 'cell_ID') - mergetest_perm_sub[, zscore := ((V1 - colmean)* nr_markers^(1/2)) / colSd] + perm_res <- unlist(perm_list) + perm_type_list[[row]] <- perm_res + } - res_list[[group_i]] = mergetest_perm_sub + cell_type_perm <- unlist(cell_type_list) + perm_round <- unlist(perm_type_list) + + cell_type_perm_DT <- data.table::data.table( + cell_type = names(cell_type_perm), + gene = cell_type_perm, + round = perm_round + ) + + sample_intrs_vec <- sample_intrs$nr_markers + names(sample_intrs_vec) <- sample_intrs$cell_type + cell_type_perm_DT[, nr_markers := sample_intrs_vec[cell_type]] + + + ## 3. decide on number of blocks to process ## + nr_perm_lines <- as.numeric(nrow(cell_type_perm_DT)) + nr_spots <- as.numeric(ncol(expr_values)) + total_lines <- nr_spots * nr_perm_lines + nr_groups <- round(total_lines / max_block) + + ## 4. create groups + all_perms <- unique(perm_round) + all_perms_num <- seq_along(all_perms) + names(all_perms_num) <- all_perms + group_labels <- paste0("group_", 1:nr_groups) + groups_vec <- cut( + all_perms_num, breaks = nr_groups, labels = group_labels) + names(all_perms) <- groups_vec + + + ## 5. do random enrichment per block + res_list <- list() + for (group_i in seq_along(group_labels)) { + group <- group_labels[group_i] + sub_perms <- all_perms[names(all_perms) == group] + cell_type_perm_DT_sub <- cell_type_perm_DT[round %in% sub_perms] + + mergetest_perm_sub <- data.table::merge.data.table( + cell_type_perm_DT_sub, geneFold_DT, allow.cartesian = TRUE) + mergetest_perm_sub <- mergetest_perm_sub[ + , mean(fc), by = .(cell_type, cell_ID, nr_markers, round)] + if (is.integer(mergetest_perm_sub$cell_ID) && is.character( + cellColMeanSd$cell_ID)) { + mergetest_perm_sub$cell_ID <- as.character( + mergetest_perm_sub$cell_ID) + } + mergetest_perm_sub <- data.table::merge.data.table( + mergetest_perm_sub, cellColMeanSd, by = "cell_ID") + mergetest_perm_sub[, zscore := (( + V1 - colmean) * nr_markers^(1 / 2)) / colSd] + + res_list[[group_i]] <- mergetest_perm_sub + } - } + res_list_comb <- do.call("rbind", res_list) + res_list_comb_average <- res_list_comb[ + , .(mean_zscore = mean(zscore), sd_zscore = stats::sd(zscore)), + by = c("cell_ID", "cell_type")] + mergetest_final <- data.table::merge.data.table( + mergetest, res_list_comb_average, by = c("cell_ID", "cell_type")) + + ## calculate p.values based on normal distribution + if (include_depletion == TRUE) { + mergetest_final[, pval := stats::pnorm( + abs(zscore), mean = mean_zscore, sd = sd_zscore, + lower.tail = FALSE, log.p = FALSE)] + } else { + mergetest_final[, pval := stats::pnorm( + zscore, mean = mean_zscore, sd = sd_zscore, + lower.tail = FALSE, log.p = FALSE)] + } - res_list_comb = do.call('rbind', res_list) - res_list_comb_average = res_list_comb[, .(mean_zscore = mean(zscore), sd_zscore = stats::sd(zscore)), by = c('cell_ID', 'cell_type')] - mergetest_final = data.table::merge.data.table(mergetest, res_list_comb_average, by = c('cell_ID', 'cell_type')) + data.table::setorder(mergetest_final, pval) - ## calculate p.values based on normal distribution - if(include_depletion == TRUE) { - mergetest_final[, pval := stats::pnorm(abs(zscore), mean = mean_zscore, sd = sd_zscore, lower.tail = FALSE, log.p = FALSE)] - } else { - mergetest_final[, pval := stats::pnorm(zscore, mean = mean_zscore, sd = sd_zscore, lower.tail = FALSE, log.p = FALSE)] - } + ## calculate pval_score + if (include_depletion == TRUE) { + mergetest_final[, pval_score := sign(zscore) * -log10(pval)] + } else { + mergetest_final[, pval_score := -log10(pval)] + } - data.table::setorder(mergetest_final, pval) - ## calculate pval_score - if(include_depletion == TRUE) { - mergetest_final[, pval_score := sign(zscore)*-log10(pval)] + resultmatrix <- data.table::dcast( + mergetest_final, formula = cell_ID ~ cell_type, + value.var = "pval_score") + return(list(DT = mergetest_final, matrix = resultmatrix)) } else { - mergetest_final[, pval_score := -log10(pval)] + resultmatrix <- data.table::dcast( + mergetest, formula = cell_ID ~ cell_type, value.var = "zscore") + return(list(DT = mergetest, matrix = resultmatrix)) } - - - resultmatrix = data.table::dcast(mergetest_final, formula = cell_ID~cell_type, value.var = 'pval_score') - return(list(DT = mergetest_final, matrix = resultmatrix)) - - - } else { - - resultmatrix = data.table::dcast(mergetest, formula = cell_ID~cell_type, value.var = 'zscore') - return(list(DT = mergetest, matrix = resultmatrix)) - - } - } #' @title runPAGEEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using PAGE. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using PAGE. #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type #' @param sign_matrix Matrix of signature genes for each cell type / process #' @param expression_values expression values to use -#' @param min_overlap_genes minimum number of overlapping genes in sign_matrix required to calculate enrichment +#' @param min_overlap_genes minimum number of overlapping genes in sign_matrix +#' required to calculate enrichment #' @param reverse_log_scale reverse expression values from log scale #' @param logbase log base to use if reverse_log_scale = TRUE #' @param output_enrichment how to return enrichment output @@ -727,121 +776,140 @@ runPAGEEnrich_OLD <- function(gobject, #' @param return_gobject return giotto object #' @return data.table with enrichment results #' @details -#' sign_matrix: a binary matrix with genes as row names and cell-types as column names. -#' Alternatively a list of signature genes can be provided to makeSignMatrixPAGE, which will create -#' the matrix for you. \cr +#' sign_matrix: a binary matrix with genes as row names and cell-types as +#' column names. +#' Alternatively a list of signature genes can be provided to +#' makeSignMatrixPAGE, which will create the matrix for you. \cr #' #' The enrichment Z score is calculated by using method (PAGE) from -#' Kim SY et al., BMC bioinformatics, 2005 as \eqn{Z = ((Sm – mu)*m^(1/2)) / delta}. -#' For each gene in each spot, mu is the fold change values versus the mean expression -#' and delta is the standard deviation. Sm is the mean fold change value of a specific marker gene set -#' and m is the size of a given marker gene set. +#' Kim SY et al., BMC bioinformatics, 2005 as +#' \eqn{Z = ((Sm – mu)*m^(1/2)) / delta}. +#' For each gene in each spot, mu is the fold change values versus the mean +#' expression and delta is the standard deviation. Sm is the mean fold change +#' value of a specific marker gene set and m is the size of a given marker +#' gene set. #' @seealso \code{\link{makeSignMatrixPAGE}} #' @export -runPAGEEnrich = function(gobject, - spat_unit = NULL, - feat_type = NULL, - sign_matrix, - expression_values = c('normalized', 'scaled', 'custom'), - min_overlap_genes = 5, - reverse_log_scale = TRUE, - logbase = 2, - output_enrichment = c('original', 'zscore'), - p_value = FALSE, - include_depletion = FALSE, - n_times = 1000, - max_block = 20e6, - name = NULL, - verbose = TRUE, - return_gobject = TRUE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom'), expression_values)) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - # check parameters - if(is.null(name)) name = 'PAGE' - - PAGE_results = .page_dt_method(sign_matrix = sign_matrix, - expr_values = as.matrix(expr_values[]), - min_overlap_genes = min_overlap_genes, - logbase = logbase, - reverse_log_scale = reverse_log_scale, - output_enrichment = c('original', 'zscore'), - p_value = p_value, - include_depletion = include_depletion, - n_times = n_times, - max_block = max_block, - verbose = verbose) - - # create spatial enrichment object - enrObj = create_spat_enr_obj(name = name, - method = 'PAGE', - enrichDT = PAGE_results[['matrix']], - spat_unit = spat_unit, - feat_type = feat_type, - provenance = expr_values@provenance, - misc = list(expr_values_used = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - p_values_calculated = p_value, - output_enrichment_scores = output_enrichment, - include_depletion = include_depletion, - nr_permutations = n_times)) - - ## return object or results ## - if(return_gobject == TRUE) { - - spenr_names = list_spatial_enrichments_names(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - if(name %in% spenr_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_spatial_enrichment') - - # parameters to include - parameters_list[[update_name]] = c('method used' = 'PAGE', - 'enrichment name' = name, - 'expression values' = expression_values, - 'reverse log scale' = reverse_log_scale, - 'logbase' = logbase, - 'output enrichment scores' = output_enrichment, - 'p values calculated' = p_value, - 'include depletion' = include_depletion, - 'nr permutations' = n_times) - - gobject@parameters = parameters_list - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_spatial_enrichment(gobject = gobject, - spatenrichment = enrObj) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - - return(gobject) - - } else { - PAGE_results[['matrix']] = enrObj - return(PAGE_results) - } +runPAGEEnrich <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + sign_matrix, + expression_values = c("normalized", "scaled", "custom"), + min_overlap_genes = 5, + reverse_log_scale = TRUE, + logbase = 2, + output_enrichment = c("original", "zscore"), + p_value = FALSE, + include_depletion = FALSE, + n_times = 1000, + max_block = 20e6, + name = NULL, + verbose = TRUE, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom"), expression_values)) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + # check parameters + if (is.null(name)) name <- "PAGE" + + PAGE_results <- .page_dt_method( + sign_matrix = sign_matrix, + expr_values = as.matrix(expr_values[]), + min_overlap_genes = min_overlap_genes, + logbase = logbase, + reverse_log_scale = reverse_log_scale, + output_enrichment = c("original", "zscore"), + p_value = p_value, + include_depletion = include_depletion, + n_times = n_times, + max_block = max_block, + verbose = verbose + ) + + # create spatial enrichment object + enrObj <- create_spat_enr_obj( + name = name, + method = "PAGE", + enrichDT = PAGE_results[["matrix"]], + spat_unit = spat_unit, + feat_type = feat_type, + provenance = expr_values@provenance, + misc = list( + expr_values_used = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + p_values_calculated = p_value, + output_enrichment_scores = output_enrichment, + include_depletion = include_depletion, + nr_permutations = n_times + ) + ) + + ## return object or results ## + if (return_gobject == TRUE) { + spenr_names <- list_spatial_enrichments_names( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + if (name %in% spenr_names) { + cat(name, " has already been used, will be overwritten") + } + + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_spatial_enrichment") + + # parameters to include + parameters_list[[update_name]] <- c( + "method used" = "PAGE", + "enrichment name" = name, + "expression values" = expression_values, + "reverse log scale" = reverse_log_scale, + "logbase" = logbase, + "output enrichment scores" = output_enrichment, + "p values calculated" = p_value, + "include depletion" = include_depletion, + "nr permutations" = n_times + ) + + gobject@parameters <- parameters_list + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_spatial_enrichment( + gobject = gobject, + spatenrichment = enrObj + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + + return(gobject) + } else { + PAGE_results[["matrix"]] <- enrObj + return(PAGE_results) + } } @@ -851,16 +919,15 @@ runPAGEEnrich = function(gobject, #' @title PAGEEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using PAGE. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using PAGE. #' @inheritDotParams runPAGEEnrich #' @seealso \code{\link{runPAGEEnrich}} #' @export PAGEEnrich <- function(...) { + .Deprecated(new = "runPAGEEnrich") - .Deprecated(new = "runPAGEEnrich") - - runPAGEEnrich(...) - + runPAGEEnrich(...) } @@ -870,20 +937,22 @@ PAGEEnrich <- function(...) { #' @title Rank permutation #' @description creates permutation for the rankEnrich test #' @keywords internal -.do_rank_permutation = function(sc_gene, n){ - random_df = data.frame(matrix(ncol = n, nrow = length(sc_gene))) - for (i in 1:n){ - set.seed(i) - random_rank = sample(seq_along(sc_gene), length(sc_gene), replace=FALSE) - random_df[,i] = random_rank - } - rownames(random_df) = sc_gene - return(random_df) +.do_rank_permutation <- function(sc_gene, n) { + random_df <- data.frame(matrix(ncol = n, nrow = length(sc_gene))) + for (i in 1:n) { + set.seed(i) + random_rank <- sample( + seq_along(sc_gene), length(sc_gene), replace = FALSE) + random_df[, i] <- random_rank + } + rownames(random_df) <- sc_gene + return(random_df) } #' @title runRankEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using a rank based approach. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using a rank based approach. #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -901,225 +970,251 @@ PAGEEnrich <- function(...) { #' @param return_gobject return giotto object #' @return data.table with enrichment results #' @details -#' sign_matrix: a rank-fold matrix with genes as row names and cell-types as column names. -#' Alternatively a scRNA-seq matrix and vector with clusters can be provided to makeSignMatrixRank, which will create -#' the matrix for you. \cr +#' sign_matrix: a rank-fold matrix with genes as row names and cell-types as +#' column names. +#' Alternatively a scRNA-seq matrix and vector with clusters can be provided +#' to makeSignMatrixRank, which will create the matrix for you. \cr #' #' First a new rank is calculated as R = (R1*R2)^(1/2), where R1 is the rank of -#' fold-change for each gene in each spot and R2 is the rank of each marker in each cell type. -#' The Rank-Biased Precision is then calculated as: RBP = (1 - 0.99) * (0.99)^(R - 1) +#' fold-change for each gene in each spot and R2 is the rank of each marker in +#' each cell type. +#' The Rank-Biased Precision is then calculated as: +#' RBP = (1 - 0.99) * (0.99)^(R - 1) #' and the final enrichment score is then calculated as the sum of top 100 RBPs. #' @seealso \code{\link{makeSignMatrixRank}} #' @export -runRankEnrich = function(gobject, - spat_unit = NULL, - feat_type = NULL, - sign_matrix, - expression_values = c('normalized', "raw", 'scaled', 'custom'), - reverse_log_scale = TRUE, - logbase = 2, - output_enrichment = c('original', 'zscore'), - ties_method = c("average", "max"), - p_value = FALSE, - n_times = 1000, - rbp_p = 0.99, - num_agg = 100, - name = NULL, - return_gobject = TRUE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # determine ties.method - ties_method = match.arg(ties_method, choices = c("average", "max")) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - if(values == "raw"){ - expr_values[] = Matrix::as.matrix(expr_values[]) - } - - # check parameters - if(is.null(name)) name = 'rank' - - #check gene list - interGene = intersect(rownames(sign_matrix), rownames(expr_values[])) - if (length(interGene)<100){ - stop("Please check the gene numbers or names of scRNA-seq. The names of scRNA-seq should be consistent with spatial data.") - } - - # output enrichment - output_enrichment = match.arg(output_enrichment, choices = c('original', 'zscore')) - - enrichment = matrix(data = NA, - nrow = dim(sign_matrix)[2], - ncol = dim(expr_values[])[2]) - - # calculate mean gene expression - if(reverse_log_scale == TRUE) { - mean_gene_expr = log(Matrix::rowMeans(logbase^expr_values[]-1, dims = 1)+1) - } else { - mean_gene_expr = Matrix::rowMeans(expr_values[]) - } - - # fold change and ranking - #geneFold = expr_values - mean_gene_expr - #rankFold = t(matrixStats::colRanks(-geneFold, ties.method = "first")) - - ties_1 = ties_method - ties_2 = ties_method - if(ties_method == "max"){ - ties_1 = "min" - ties_2 = "max" - } - #else ties_1=ties_2 is equal to random - geneFold = expr_values[] - geneFold = sparseMatrixStats::rowRanks(geneFold, ties.method = ties_1) - rankFold = t(sparseMatrixStats::colRanks(-geneFold, ties.method = ties_2)) - - rownames(rankFold) = rownames(expr_values[]) - colnames(rankFold) = colnames(expr_values[]) - - for (i in (1:dim(sign_matrix)[2])){ - - signames = rownames(sign_matrix)[which(sign_matrix[,i]>0)] - interGene = intersect(signames, rownames(rankFold)) - filterSig = sign_matrix[interGene,] - filterRankFold = rankFold[interGene,] - - multiplyRank = (filterRankFold*filterSig[,i])^(1/2) - rpb = (1.0 - rbp_p)*(rbp_p^(multiplyRank-1)) - - vectorX = rep(NA, dim(filterRankFold)[2]) - - for (j in (1:dim(filterRankFold)[2])){ - toprpb = sort(rpb[,j],decreasing = T) - zscore = sum(toprpb[1:num_agg]) - vectorX[j] = zscore - } - enrichment[i,] = vectorX - } - - rownames(enrichment) = colnames(sign_matrix) - colnames(enrichment) = colnames(rankFold) - - enrichment = t(enrichment) - - if(output_enrichment == 'zscore') { - enrichment = scale(enrichment) - } - - enrichmentDT = data.table::data.table(cell_ID = rownames(enrichment)) - enrichmentDT = cbind(enrichmentDT, data.table::as.data.table(enrichment)) - - - # default name for page enrichment - - if(isTRUE(p_value)){ - random_rank = .do_rank_permutation(sc_gene = rownames(sign_matrix), - n = n_times) - - random_DT = runRankEnrich(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sign_matrix = random_rank, - expression_values = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - output_enrichment = output_enrichment, - p_value = FALSE) - - background = unlist(random_DT[,2:dim(random_DT)[2]]) - fit.gamma = fitdistrplus::fitdist(background, distr = "gamma", method = "mle") - pvalue_DT = enrichmentDT - enrichmentDT[,2:dim(enrichmentDT)[2]] = lapply(enrichmentDT[,2:dim(enrichmentDT)[2]], function(x) - {stats::pgamma(x, fit.gamma$estimate[1], rate = fit.gamma$estimate[2], lower.tail = FALSE, log.p = FALSE)}) - } - - # create spatial enrichment object - enrObj = create_spat_enr_obj(name = name, - method = 'rank', - enrichDT = enrichmentDT, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = expr_values@provenance, - misc = list(expr_values_used = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - p_values_calculated = p_value, - output_enrichment_scores = output_enrichment, - nr_permutations = n_times)) - - ## return object or results ## - if(return_gobject == TRUE) { - - spenr_names = list_spatial_enrichments_names(gobject = gobject, spat_unit = spat_unit, feat_type = feat_type) - - - if(name %in% spenr_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_spatial_enrichment') - - # parameters to include - parameters_list[[update_name]] = c('method used' = 'rank', - 'enrichment name' = name, - 'expression values' = expression_values, - 'reverse log scale' = reverse_log_scale, - 'logbase' = logbase, - 'p-values calculated' = p_value, - 'output enrichment scores' = output_enrichment, - 'p values calculated' = p_value, - 'nr permutations' = n_times) - gobject@parameters = parameters_list +runRankEnrich <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + sign_matrix, + expression_values = c("normalized", "raw", "scaled", "custom"), + reverse_log_scale = TRUE, + logbase = 2, + output_enrichment = c("original", "zscore"), + ties_method = c("average", "max"), + p_value = FALSE, + n_times = 1000, + rbp_p = 0.99, + num_agg = 100, + name = NULL, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_spatial_enrichment(gobject = gobject, - spatenrichment = enrObj) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + # determine ties.method + ties_method <- match.arg(ties_method, choices = c("average", "max")) + + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + if (values == "raw") { + expr_values[] <- Matrix::as.matrix(expr_values[]) + } + + # check parameters + if (is.null(name)) name <- "rank" + + # check gene list + interGene <- intersect(rownames(sign_matrix), rownames(expr_values[])) + if (length(interGene) < 100) { + stop("Please check the gene numbers or names of scRNA-seq. The names + of scRNA-seq should be consistent with spatial data.") + } + + # output enrichment + output_enrichment <- match.arg( + output_enrichment, choices = c("original", "zscore")) + + enrichment <- matrix( + data = NA, + nrow = dim(sign_matrix)[2], + ncol = dim(expr_values[])[2] + ) + + # calculate mean gene expression + if (reverse_log_scale == TRUE) { + mean_gene_expr <- log(Matrix::rowMeans( + logbase^expr_values[] - 1, dims = 1) + 1) + } else { + mean_gene_expr <- Matrix::rowMeans(expr_values[]) + } + + # fold change and ranking + + ties_1 <- ties_method + ties_2 <- ties_method + if (ties_method == "max") { + ties_1 <- "min" + ties_2 <- "max" + } + # else ties_1=ties_2 is equal to random + geneFold <- expr_values[] + geneFold <- sparseMatrixStats::rowRanks(geneFold, ties.method = ties_1) + rankFold <- t(sparseMatrixStats::colRanks(-geneFold, ties.method = ties_2)) + + rownames(rankFold) <- rownames(expr_values[]) + colnames(rankFold) <- colnames(expr_values[]) + + for (i in (1:dim(sign_matrix)[2])) { + signames <- rownames(sign_matrix)[which(sign_matrix[, i] > 0)] + interGene <- intersect(signames, rownames(rankFold)) + filterSig <- sign_matrix[interGene, ] + filterRankFold <- rankFold[interGene, ] + + multiplyRank <- (filterRankFold * filterSig[, i])^(1 / 2) + rpb <- (1.0 - rbp_p) * (rbp_p^(multiplyRank - 1)) + + vectorX <- rep(NA, dim(filterRankFold)[2]) + + for (j in (1:dim(filterRankFold)[2])) { + toprpb <- sort(rpb[, j], decreasing = TRUE) + zscore <- sum(toprpb[1:num_agg]) + vectorX[j] <- zscore + } + enrichment[i, ] <- vectorX + } + + rownames(enrichment) <- colnames(sign_matrix) + colnames(enrichment) <- colnames(rankFold) - return(gobject) - - } else { - return(enrObj) - } + enrichment <- t(enrichment) + if (output_enrichment == "zscore") { + enrichment <- scale(enrichment) + } + + enrichmentDT <- data.table::data.table(cell_ID = rownames(enrichment)) + enrichmentDT <- cbind(enrichmentDT, data.table::as.data.table(enrichment)) + + + # default name for page enrichment + + if (isTRUE(p_value)) { + random_rank <- .do_rank_permutation( + sc_gene = rownames(sign_matrix), + n = n_times + ) + + random_DT <- runRankEnrich( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sign_matrix = random_rank, + expression_values = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + output_enrichment = output_enrichment, + p_value = FALSE + ) + + background <- unlist(random_DT[, 2:dim(random_DT)[2]]) + fit.gamma <- fitdistrplus::fitdist( + background, distr = "gamma", method = "mle") + pvalue_DT <- enrichmentDT + enrichmentDT[, 2:dim(enrichmentDT)[2]] <- lapply( + enrichmentDT[, 2:dim(enrichmentDT)[2]], function(x) { + stats::pgamma( + x, fit.gamma$estimate[1], rate = fit.gamma$estimate[2], + lower.tail = FALSE, log.p = FALSE) + }) + } + + # create spatial enrichment object + enrObj <- create_spat_enr_obj( + name = name, + method = "rank", + enrichDT = enrichmentDT, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = expr_values@provenance, + misc = list( + expr_values_used = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + p_values_calculated = p_value, + output_enrichment_scores = output_enrichment, + nr_permutations = n_times + ) + ) + + ## return object or results ## + if (return_gobject == TRUE) { + spenr_names <- list_spatial_enrichments_names( + gobject = gobject, spat_unit = spat_unit, feat_type = feat_type) + + + if (name %in% spenr_names) { + cat(name, " has already been used, will be overwritten") + } + + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_spatial_enrichment") + + # parameters to include + parameters_list[[update_name]] <- c( + "method used" = "rank", + "enrichment name" = name, + "expression values" = expression_values, + "reverse log scale" = reverse_log_scale, + "logbase" = logbase, + "p-values calculated" = p_value, + "output enrichment scores" = output_enrichment, + "p values calculated" = p_value, + "nr permutations" = n_times + ) + gobject@parameters <- parameters_list + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_spatial_enrichment( + gobject = gobject, + spatenrichment = enrObj + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + return(gobject) + } else { + return(enrObj) + } } #' @title rankEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using a rank based approach. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using a rank based approach. #' @inheritDotParams runRankEnrich #' @seealso \code{\link{runRankEnrich}} #' @export rankEnrich <- function(...) { + .Deprecated(new = "runRankEnrich") - .Deprecated(new = "runRankEnrich") - - runRankEnrich(...) - + runRankEnrich(...) } #' @title runHyperGeometricEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using a hypergeometric test. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using a hypergeometric test. #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -1127,7 +1222,8 @@ rankEnrich <- function(...) { #' @param expression_values expression values to use #' @param reverse_log_scale reverse expression values from log scale #' @param logbase log base to use if reverse_log_scale = TRUE -#' @param top_percentage percentage of cells that will be considered to have gene expression with matrix binarization +#' @param top_percentage percentage of cells that will be considered to have +#' gene expression with matrix binarization #' @param output_enrichment how to return enrichment output #' @param p_value calculate p-values (boolean, default = FALSE) #' @param name to give to spatial enrichment results, default = hypergeometric @@ -1136,177 +1232,194 @@ rankEnrich <- function(...) { #' @details The enrichment score is calculated based on the p-value from the #' hypergeometric test, -log10(p-value). #' @export -runHyperGeometricEnrich = function(gobject, - spat_unit = NULL, - feat_type = NULL, - sign_matrix, - expression_values = c('normalized', 'scaled', 'custom'), - reverse_log_scale = TRUE, - logbase = 2, - top_percentage = 5, - output_enrichment = c('original', 'zscore'), - p_value = FALSE, - name = NULL, - return_gobject = TRUE) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - ## temporary ## - # if(!'matrix' %in% class(expr_values)) { - # warning('The expression matrix is not stored as a base matrix and will be changed to a base matrix object. \n - # This will be updated in the future') - # expr_values = as.matrix(expr_values) - #} - - - # check parameters - if(is.null(name)) name = 'hypergeometric' - - # output enrichment - output_enrichment = match.arg(output_enrichment, choices = c('original', 'zscore')) - - # calculate mean gene expression - if(reverse_log_scale == TRUE) { - expr_values[] = logbase^expr_values[]-1 - } - - interGene = intersect(rownames(expr_values[]),rownames(sign_matrix)) - - inter_sign_matrix = sign_matrix[interGene,] - - aveExp = log2(2*(Matrix::rowMeans(2^(expr_values[]-1), dims = 1))+1) - - foldChange = expr_values[]-aveExp - - top_q = 1-top_percentage/100 - quantilecut = apply(foldChange, 2 , stats::quantile , probs = top_q, na.rm = TRUE ) - expbinary = t_flex(1* t_flex(foldChange > quantilecut)) - - markerGenes = rownames(inter_sign_matrix) - expbinaryOverlap = expbinary[markerGenes,] - total = length(markerGenes) - enrichment = matrix(data=NA, - nrow = dim(inter_sign_matrix)[2], - ncol=dim(expbinaryOverlap)[2]) - - for (i in (1:dim(inter_sign_matrix)[2])){ - signames = rownames(inter_sign_matrix)[which(inter_sign_matrix[,i]==1)] - vectorX = NULL - - for (j in(1:dim(expbinaryOverlap)[2])){ - - cellsiggene = names(expbinaryOverlap[which(expbinaryOverlap[,j]==1),j]) - x = length(intersect(cellsiggene,signames)) - m = length(rownames(inter_sign_matrix)[which(inter_sign_matrix[,i]==1)]) - n = total-m - k = length(intersect(cellsiggene, markerGenes)) - enrich<-(0-log10(stats::phyper(x, m, n, k, log = FALSE,lower.tail = FALSE))) - vectorX = append(vectorX,enrich) - } - enrichment[i,] = vectorX - } - - rownames(enrichment) = colnames(inter_sign_matrix) - colnames(enrichment) = colnames(expbinaryOverlap) - - enrichment = t(enrichment) - - if(output_enrichment == 'zscore') { - enrichment = scale(enrichment) - } - - enrichmentDT = data.table::data.table(cell_ID = rownames(enrichment)) - enrichmentDT = cbind(enrichmentDT, data.table::as.data.table(enrichment)) - - - ## calculate p-values ## - if (p_value == TRUE){ - enrichmentDT[,2:dim(enrichmentDT)[2]] = lapply(enrichmentDT[,2:dim(enrichmentDT)[2]],function(x){10^(-x)}) - } - - # create spatial enrichment object - enrObj = create_spat_enr_obj(name = name, - method = 'hypergeometric', - enrichDT = enrichmentDT, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = expr_values@provenance, - misc = list(expr_values_used = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - top_percentage = top_percentage, - p_values_calculated = p_value, - output_enrichment_scores = output_enrichment)) - - ## return object or results ## - if(return_gobject == TRUE) { - - spenr_names = list_spatial_enrichments_names(gobject = gobject, spat_unit = spat_unit, feat_type = feat_type) - - if(name %in% spenr_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_spatial_enrichment') +runHyperGeometricEnrich <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + sign_matrix, + expression_values = c("normalized", "scaled", "custom"), + reverse_log_scale = TRUE, + logbase = 2, + top_percentage = 5, + output_enrichment = c("original", "zscore"), + p_value = FALSE, + name = NULL, + return_gobject = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + + # check parameters + if (is.null(name)) name <- "hypergeometric" + + # output enrichment + output_enrichment <- match.arg( + output_enrichment, choices = c("original", "zscore")) + + # calculate mean gene expression + if (reverse_log_scale == TRUE) { + expr_values[] <- logbase^expr_values[] - 1 + } - # parameters to include - parameters_list[[update_name]] = c('method used' = 'hypergeometric', - 'enrichment name' = name, - 'expression values' = expression_values, - 'reverse log scale' = reverse_log_scale, - 'logbase' = logbase, - 'top percentage' = top_percentage, - 'p-values calculated' = p_value, - 'output enrichment scores' = output_enrichment, - 'p values calculated' = p_value) - gobject@parameters = parameters_list + interGene <- intersect(rownames(expr_values[]), rownames(sign_matrix)) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_spatial_enrichment(gobject = gobject, - spatenrichment = enrObj) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + inter_sign_matrix <- sign_matrix[interGene, ] - return(gobject) - - } else { - return(enrObj) - } + aveExp <- log2(2 * (Matrix::rowMeans(2^(expr_values[] - 1), dims = 1)) + 1) + + foldChange <- expr_values[] - aveExp + + top_q <- 1 - top_percentage / 100 + quantilecut <- apply( + foldChange, 2, stats::quantile, probs = top_q, na.rm = TRUE) + expbinary <- t_flex(1 * t_flex(foldChange > quantilecut)) + + markerGenes <- rownames(inter_sign_matrix) + expbinaryOverlap <- expbinary[markerGenes, ] + total <- length(markerGenes) + enrichment <- matrix( + data = NA, + nrow = dim(inter_sign_matrix)[2], + ncol = dim(expbinaryOverlap)[2] + ) + + for (i in (1:dim(inter_sign_matrix)[2])) { + signames <- rownames(inter_sign_matrix)[ + which(inter_sign_matrix[, i] == 1)] + vectorX <- NULL + + for (j in (1:dim(expbinaryOverlap)[2])) { + cellsiggene <- names(expbinaryOverlap[ + which(expbinaryOverlap[, j] == 1), j]) + x <- length(intersect(cellsiggene, signames)) + m <- length(rownames(inter_sign_matrix)[which( + inter_sign_matrix[, i] == 1)]) + n <- total - m + k <- length(intersect(cellsiggene, markerGenes)) + enrich <- (0 - log10(stats::phyper( + x, m, n, k, log.p = FALSE, lower.tail = FALSE))) + vectorX <- append(vectorX, enrich) + } + enrichment[i, ] <- vectorX + } + + rownames(enrichment) <- colnames(inter_sign_matrix) + colnames(enrichment) <- colnames(expbinaryOverlap) + + enrichment <- t(enrichment) + + if (output_enrichment == "zscore") { + enrichment <- scale(enrichment) + } + + enrichmentDT <- data.table::data.table(cell_ID = rownames(enrichment)) + enrichmentDT <- cbind(enrichmentDT, data.table::as.data.table(enrichment)) + + + ## calculate p-values ## + if (p_value == TRUE) { + enrichmentDT[, 2:dim(enrichmentDT)[2]] <- lapply( + enrichmentDT[, 2:dim(enrichmentDT)[2]], function(x) { + 10^(-x) + }) + } + + # create spatial enrichment object + enrObj <- create_spat_enr_obj( + name = name, + method = "hypergeometric", + enrichDT = enrichmentDT, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = expr_values@provenance, + misc = list( + expr_values_used = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + top_percentage = top_percentage, + p_values_calculated = p_value, + output_enrichment_scores = output_enrichment + ) + ) + + ## return object or results ## + if (return_gobject == TRUE) { + spenr_names <- list_spatial_enrichments_names( + gobject = gobject, spat_unit = spat_unit, feat_type = feat_type) + + if (name %in% spenr_names) { + cat(name, " has already been used, will be overwritten") + } + + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_spatial_enrichment") + + # parameters to include + parameters_list[[update_name]] <- c( + "method used" = "hypergeometric", + "enrichment name" = name, + "expression values" = expression_values, + "reverse log scale" = reverse_log_scale, + "logbase" = logbase, + "top percentage" = top_percentage, + "p-values calculated" = p_value, + "output enrichment scores" = output_enrichment, + "p values calculated" = p_value + ) + gobject@parameters <- parameters_list + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_spatial_enrichment( + gobject = gobject, + spatenrichment = enrObj + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + return(gobject) + } else { + return(enrObj) + } } #' @title hyperGeometricEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using a hypergeometric test. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using a hypergeometric test. #' @inheritDotParams runHyperGeometricEnrich #' @seealso \code{\link{runHyperGeometricEnrich}} #' @export hyperGeometricEnrich <- function(...) { + .Deprecated(new = "runHyperGeometricEnrich") - .Deprecated(new = "runHyperGeometricEnrich") - - runHyperGeometricEnrich(...) - + runHyperGeometricEnrich(...) } #' @title runSpatialEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using an enrichment test. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using an enrichment test. #' @param gobject Giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -1314,14 +1427,17 @@ hyperGeometricEnrich <- function(...) { #' @param sign_matrix Matrix of signature genes for each cell type / process #' @param expression_values expression values to use #' @param reverse_log_scale reverse expression values from log scale -#' @param min_overlap_genes minimum number of overlapping genes in sign_matrix required to calculate enrichment (PAGE) +#' @param min_overlap_genes minimum number of overlapping genes in sign_matrix +#' required to calculate enrichment (PAGE) #' @param logbase log base to use if reverse_log_scale = TRUE #' @param p_value calculate p-value (default = FALSE) -#' @param n_times (page/rank) number of permutation iterations to calculate p-value +#' @param n_times (page/rank) number of permutation iterations to calculate +#' p-value #' @param rbp_p (rank) fractional binarization threshold (default = 0.99) #' @param num_agg (rank) number of top genes to aggregate (default = 100) #' @param max_block number of lines to process together (default = 20e6) -#' @param top_percentage (hyper) percentage of cells that will be considered to have gene expression with matrix binarization +#' @param top_percentage (hyper) percentage of cells that will be considered +#' to have gene expression with matrix binarization #' @param output_enrichment how to return enrichment output #' @param name to give to spatial enrichment results, default = PAGE #' @param verbose be verbose @@ -1335,96 +1451,95 @@ hyperGeometricEnrich <- function(...) { #' } #' #' @export -runSpatialEnrich = function(gobject, - spat_unit = NULL, - feat_type = NULL, - enrich_method = c('PAGE', 'rank', 'hypergeometric'), - sign_matrix, - expression_values = c('normalized', 'scaled', 'custom'), - min_overlap_genes = 5, - reverse_log_scale = TRUE, - logbase = 2, - p_value = FALSE, - n_times = 1000, - rbp_p = 0.99, - num_agg = 100, - max_block = 20e6, - top_percentage = 5, - output_enrichment = c('original', 'zscore'), - name = NULL, - verbose = TRUE, - return_gobject = TRUE) { - - - enrich_method = match.arg(enrich_method, choices = c('PAGE', 'rank', 'hypergeometric')) - - output_enrichment = match.arg(output_enrichment, choices = c('original', 'zscore')) - - - if(enrich_method == 'PAGE') { - - results = runPAGEEnrich(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sign_matrix = sign_matrix, - expression_values = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - output_enrichment = output_enrichment, - p_value = p_value, - n_times = n_times, - name = name, - return_gobject = return_gobject) - - } else if(enrich_method == 'rank') { - - results = runRankEnrich(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sign_matrix = sign_matrix, - expression_values = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - output_enrichment = output_enrichment, - p_value = p_value, - n_times = n_times, - rbp_p = rbp_p, - num_agg = num_agg, - name = name, - return_gobject = return_gobject) - - - } else if(enrich_method == 'hypergeometric'){ - - results = runHyperGeometricEnrich(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - sign_matrix = sign_matrix, - expression_values = expression_values, - reverse_log_scale = reverse_log_scale, - logbase = logbase, - top_percentage = top_percentage, - output_enrichment = output_enrichment, - p_value = p_value, - name = name, - return_gobject = return_gobject) - } - - return(results) +runSpatialEnrich <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + enrich_method = c("PAGE", "rank", "hypergeometric"), + sign_matrix, + expression_values = c("normalized", "scaled", "custom"), + min_overlap_genes = 5, + reverse_log_scale = TRUE, + logbase = 2, + p_value = FALSE, + n_times = 1000, + rbp_p = 0.99, + num_agg = 100, + max_block = 20e6, + top_percentage = 5, + output_enrichment = c("original", "zscore"), + name = NULL, + verbose = TRUE, + return_gobject = TRUE) { + enrich_method <- match.arg( + enrich_method, choices = c("PAGE", "rank", "hypergeometric")) + + output_enrichment <- match.arg( + output_enrichment, choices = c("original", "zscore")) + + + if (enrich_method == "PAGE") { + results <- runPAGEEnrich( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sign_matrix = sign_matrix, + expression_values = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + output_enrichment = output_enrichment, + p_value = p_value, + n_times = n_times, + name = name, + return_gobject = return_gobject + ) + } else if (enrich_method == "rank") { + results <- runRankEnrich( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sign_matrix = sign_matrix, + expression_values = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + output_enrichment = output_enrichment, + p_value = p_value, + n_times = n_times, + rbp_p = rbp_p, + num_agg = num_agg, + name = name, + return_gobject = return_gobject + ) + } else if (enrich_method == "hypergeometric") { + results <- runHyperGeometricEnrich( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + sign_matrix = sign_matrix, + expression_values = expression_values, + reverse_log_scale = reverse_log_scale, + logbase = logbase, + top_percentage = top_percentage, + output_enrichment = output_enrichment, + p_value = p_value, + name = name, + return_gobject = return_gobject + ) + } + + return(results) } #' @title createSpatialEnrich -#' @description Function to calculate gene signature enrichment scores per spatial position using an enrichment test. +#' @description Function to calculate gene signature enrichment scores per +#' spatial position using an enrichment test. #' @inheritDotParams runSpatialEnrich #' @seealso \code{\link{runSpatialEnrich}} #' @export createSpatialEnrich <- function(...) { + .Deprecated(new = "runSpatialEnrich") - .Deprecated(new = "runSpatialEnrich") - - runSpatialEnrich(...) - + runSpatialEnrich(...) } @@ -1441,39 +1556,41 @@ createSpatialEnrich <- function(...) { #' @param feats features (expression) on which to run autocorrelation. #' (leaving as NULL means that all features will be tested) #' @param method method of autocorrelation. See details (default = 'moran') -#' @param data_to_use if using data from gobject, whether to test using expression -#' ('expression') or cell metadata ('cell_meta') +#' @param data_to_use if using data from gobject, whether to test using +#' expression ('expression') or cell metadata ('cell_meta') #' @param expression_values name of expression information to use #' @param meta_cols columns in cell metadata to test #' @param spatial_network_to_use spatial network to use -#' @param wm_method type of weight matrix to generate from spatial network if no -#' weight matrix is found attached to the spatial network +#' @param wm_method type of weight matrix to generate from spatial network +#' if no weight matrix is found attached to the spatial network #' @param wm_name name of attached weight matrix to use -#' @param node_values alternative method of directly supplying a set of node values -#' @param weight_matrix alternative method of directly supplying a spatial weight -#' matrix +#' @param node_values alternative method of directly supplying a set of node +#' values +#' @param weight_matrix alternative method of directly supplying a spatial +#' weight matrix #' @param test_method method to test values for significance (default is no #' testing) #' @param verbose be verbose -#' @description Find spatial autocorrelation. Note that \code{spatialAutoCorGlobal} -#' will return values as a data.table instead of appending information to the gobject. -#' \code{spatialAutoCorLocal} will append the results as a spatial enrichment object -#' by default. \cr -#' If providing external data using either the \code{node_values} and/or \code{weight_matrix} -#' params, the order of values provided should be the same as the ordering of the -#' columns and rows of the weight matrix. +#' @description Find spatial autocorrelation. Note that +#' \code{spatialAutoCorGlobal} will return values as a data.table instead of +#' appending information to the gobject. +#' \code{spatialAutoCorLocal} will append the results as a spatial enrichment +#' object by default. \cr +#' If providing external data using either the \code{node_values} and/or +#' \code{weight_matrix} params, the order of values provided should be the +#' same as the ordering of the columns and rows of the weight matrix. NULL # internals for spatial autocorrelation using terra #' @keywords internal -.spat_autocor_terra_numeric = function(x, w, method) { - return(terra::autocor(x = x, w = w, method = method)) +.spat_autocor_terra_numeric <- function(x, w, method) { + return(terra::autocor(x = x, w = w, method = method)) } #' @keywords internal -.spat_autocor_terra_raster = function(x, w, global = TRUE, method) { - return(terra::autocor(x = x, w = w, global = global, method = method)) +.spat_autocor_terra_raster <- function(x, w, global = TRUE, method) { + return(terra::autocor(x = x, w = w, global = global, method = method)) } @@ -1482,8 +1599,8 @@ NULL #' @describeIn spatialAutoCor Global autocorrelation (single value returned) #' -#' @param mc_nsim when \code{test_method = 'monte_carlo'} this is number of simulations -#' to perform +#' @param mc_nsim when \code{test_method = 'monte_carlo'} this is number of +#' simulations to perform #' @param cor_name name to assign the results in global autocorrelation output #' @param return_gobject (default = FALSE) whether to return results appended to #' metadata in the giotto object or as a data.table @@ -1494,147 +1611,157 @@ NULL #' \item{\emph{Geary's C} 'geary'} #' } #' @export -spatialAutoCorGlobal = function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL, - feats = NULL, - method = c('moran', 'geary'), - data_to_use = c('expression', 'cell_meta'), - expression_values = c('normalized', 'scaled', 'custom'), - meta_cols = NULL, - spatial_network_to_use = 'kNN_network', - wm_method = c('distance', 'adjacency'), - wm_name = 'spat_weights', - node_values = NULL, - weight_matrix = NULL, - test_method = c('none', 'monte_carlo'), - mc_nsim = 99, - cor_name = NULL, - return_gobject = FALSE, - verbose = TRUE) { - - # 0. determine inputs - method = match.arg(method, choices = c('moran', 'geary')) - test_method = match.arg(test_method, choices = c('none', 'monte_carlo')) - data_to_use = match.arg(data_to_use, choices = c('expression', 'cell_meta')) - if(is.null(cor_name)) cor_name = method - if(!is.null(node_values)) { - if(is.numeric(node_values)) stop(wrap_txt('External "node_values" must be type numeric.', - errWidth = TRUE)) - } - - use_ext_vals = data.table::fifelse(!is.null(node_values), yes = TRUE, no = FALSE) - use_sn = data.table::fifelse(!is.null(weight_matrix), yes = FALSE, no = TRUE) - - use_expr = data.table::fcase( - isTRUE(use_ext_vals), FALSE, - data_to_use != 'expression', FALSE, - default = TRUE - ) - - use_meta = data.table::fcase( - isTRUE(use_ext_vals), FALSE, - data_to_use != 'cell_meta', FALSE, - default = TRUE - ) - - if(data_to_use == 'cell_meta') { - if(is.null(meta_cols)) { - stop(wrap_txt( - 'If "data_to_use" is "cell_meta" then a character vector of cell metadata', - 'columns to use must be provided in "meta_cols"', - errWidth = TRUE - )) - } - } - if(isTRUE(return_gobject)) { - if(data_to_use == 'cell_meta' | isTRUE(use_ext_vals)) { - stop(wrap_txt( - 'Global spatial autocorrelations on cell_meta or external data should not', - 'be returned to the gobject. - > Please set return_gobject = FALSE', - errWidth = TRUE - )) - } - } - - # 1. setup - if(!is.null(gobject)) { - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - } else { # if null - if(any(!use_ext_vals, use_sn, return_gobject)) { - stop('gobject has not been provided\n') - } - } - - # select and format input - data_list = .evaluate_autocor_input(gobject = gobject, - use_ext_vals = use_ext_vals, - use_sn = use_sn, - use_expr = use_expr, - use_meta = use_meta, - spat_unit = spat_unit, - feat_type = feat_type, - feats = feats, - data_to_use = data_to_use, - expression_values = expression_values, - meta_cols = meta_cols, - spatial_network_to_use = spatial_network_to_use, - wm_method = wm_method, - wm_name = wm_name, - node_values = node_values, - weight_matrix = weight_matrix, - verbose = verbose) - # unpack formatted data - use_values = data_list$use_values - feats = data_list$feats - weight_matrix = data_list$weight_matrix - - - # 2. perform autocor - res_dt = .run_spat_autocor_global(use_values = use_values, - feats = feats, - weight_matrix = weight_matrix, - method = method, - test_method = test_method, - mc_nsim = mc_nsim, - cor_name = cor_name) - - - - # if(method %in% local_methods) { - # res_dt = do.call('cbind', res_list) - # colnames(res_dt) = paste0(method, '_', colnames(res_dt)) - # res_dt[, cell_ID := wm_colnames] - # } - - - - # return info - if(isTRUE(return_gobject)) { - if(isTRUE(verbose)) wrap_msg('Appending', method, 'results to feature metadata: fDataDT()') - gobject = addFeatMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = res_dt, - by_column = TRUE, - column_feat_ID = 'feat_ID') - - return(gobject) - } else { - return(res_dt) - } +spatialAutoCorGlobal <- function(gobject = NULL, + spat_unit = NULL, + feat_type = NULL, + feats = NULL, + method = c("moran", "geary"), + data_to_use = c("expression", "cell_meta"), + expression_values = c("normalized", "scaled", "custom"), + meta_cols = NULL, + spatial_network_to_use = "kNN_network", + wm_method = c("distance", "adjacency"), + wm_name = "spat_weights", + node_values = NULL, + weight_matrix = NULL, + test_method = c("none", "monte_carlo"), + mc_nsim = 99, + cor_name = NULL, + return_gobject = FALSE, + verbose = TRUE) { + # 0. determine inputs + method <- match.arg(method, choices = c("moran", "geary")) + test_method <- match.arg(test_method, choices = c("none", "monte_carlo")) + data_to_use <- match.arg( + data_to_use, choices = c("expression", "cell_meta")) + if (is.null(cor_name)) cor_name <- method + if (!is.null(node_values)) { + if (is.numeric(node_values)) { + stop(wrap_txt('External "node_values" must be type numeric.', + errWidth = TRUE + )) + } + } + + use_ext_vals <- data.table::fifelse( + !is.null(node_values), yes = TRUE, no = FALSE) + use_sn <- data.table::fifelse( + !is.null(weight_matrix), yes = FALSE, no = TRUE) + + use_expr <- data.table::fcase( + isTRUE(use_ext_vals), FALSE, + data_to_use != "expression", FALSE, + default = TRUE + ) + + use_meta <- data.table::fcase( + isTRUE(use_ext_vals), FALSE, + data_to_use != "cell_meta", FALSE, + default = TRUE + ) + + if (data_to_use == "cell_meta") { + if (is.null(meta_cols)) { + stop(wrap_txt( + 'If "data_to_use" is "cell_meta" then a character vector of + cell metadata', + 'columns to use must be provided in "meta_cols"', + errWidth = TRUE + )) + } + } + if (isTRUE(return_gobject)) { + if (data_to_use == "cell_meta" | isTRUE(use_ext_vals)) { + stop(wrap_txt( + "Global spatial autocorrelations on cell_meta or external data + should not", + "be returned to the gobject. + > Please set return_gobject = FALSE", + errWidth = TRUE + )) + } + } + + # 1. setup + if (!is.null(gobject)) { + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + } else { # if null + if (any(!use_ext_vals, use_sn, return_gobject)) { + stop("gobject has not been provided") + } + } + # select and format input + data_list <- .evaluate_autocor_input( + gobject = gobject, + use_ext_vals = use_ext_vals, + use_sn = use_sn, + use_expr = use_expr, + use_meta = use_meta, + spat_unit = spat_unit, + feat_type = feat_type, + feats = feats, + data_to_use = data_to_use, + expression_values = expression_values, + meta_cols = meta_cols, + spatial_network_to_use = spatial_network_to_use, + wm_method = wm_method, + wm_name = wm_name, + node_values = node_values, + weight_matrix = weight_matrix, + verbose = verbose + ) + # unpack formatted data + use_values <- data_list$use_values + feats <- data_list$feats + weight_matrix <- data_list$weight_matrix + + + # 2. perform autocor + res_dt <- .run_spat_autocor_global( + use_values = use_values, + feats = feats, + weight_matrix = weight_matrix, + method = method, + test_method = test_method, + mc_nsim = mc_nsim, + cor_name = cor_name + ) + + # return info + if (isTRUE(return_gobject)) { + if (isTRUE(verbose)) + wrap_msg("Appending", method, + "results to feature metadata: fDataDT()") + gobject <- addFeatMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = res_dt, + by_column = TRUE, + column_feat_ID = "feat_ID" + ) + + return(gobject) + } else { + return(res_dt) + } } -#' @describeIn spatialAutoCor Local autocorrelation (values generated for each spatial ID) +#' @describeIn spatialAutoCor Local autocorrelation +#' (values generated for each spatial ID) #' -#' @param enrich_name name to assign local autocorrelation spatial enrichment results +#' @param enrich_name name to assign local autocorrelation spatial enrichment +#' results #' @param return_gobject (default = FALSE) whether to return results appended to #' @param output 'spatEnrObj' or 'data.table' #' metadata in the giotto object or as a data.table @@ -1647,436 +1774,506 @@ spatialAutoCorGlobal = function(gobject = NULL, #' \item{\emph{Local mean} 'mean'} #' } #' @export -spatialAutoCorLocal = function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL, - feats = NULL, - method = c('moran', 'gi', 'gi*', 'mean'), - data_to_use = c('expression', 'cell_meta'), - expression_values = c('normalized', 'scaled', 'custom'), - meta_cols = NULL, - spatial_network_to_use = 'kNN_network', - wm_method = c('distance', 'adjacency'), - wm_name = 'spat_weights', - node_values = NULL, - weight_matrix = NULL, - test_method = c('none'), - # cor_name = NULL, - enrich_name = NULL, - return_gobject = TRUE, - output = c('spatEnrObj', 'data.table'), - verbose = TRUE) { - - # 0. determine inputs - method_select = match.arg(method, choices = c('moran', 'gi', 'gi*', 'mean')) - data_to_use = match.arg(data_to_use, choices = c('expression', 'cell_meta')) - output = match.arg(output, choices = c('spatEnrObj', 'data.table')) - # if(is.null(cor_name)) cor_name = method - - if(method_select == 'moran') method = 'locmor' - else method = method_select - - if(!is.null(node_values)) { - if(is.numeric(node_values)) stop(wrap_txt('External "node_values" must be type numeric', - errWidth = TRUE)) - } - - - use_ext_vals = data.table::fifelse(!is.null(node_values), yes = TRUE, no = FALSE) - use_sn = data.table::fifelse(!is.null(weight_matrix), yes = FALSE, no = TRUE) - - use_expr = data.table::fcase( - isTRUE(use_ext_vals), FALSE, - data_to_use != 'expression', FALSE, - default = TRUE - ) - - use_meta = data.table::fcase( - isTRUE(use_ext_vals), FALSE, - data_to_use != 'cell_meta', FALSE, - default = TRUE - ) - - if(is.null(enrich_name)) { # name of spatEnrObj - enrich_name = data.table::fcase( - isTRUE(use_ext_vals), method_select, - data_to_use == 'expression', paste0('expr_', method_select), - data_to_use == 'cell_meta', paste0('meta_', method_select), - default = method_select - ) - } - - - - # 1. setup - if(!is.null(gobject)) { - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - } else { # if null - if(any(!use_ext_vals, use_sn, return_gobject)) { - stop('gobject has not been provided\n') - } - } - - - # select and format input - data_list = .evaluate_autocor_input(gobject = gobject, - use_ext_vals = use_ext_vals, - use_sn = use_sn, - use_expr = use_expr, - use_meta = use_meta, - spat_unit = spat_unit, - feat_type = feat_type, - feats = feats, - data_to_use = data_to_use, - expression_values = expression_values, - meta_cols = meta_cols, - spatial_network_to_use = spatial_network_to_use, - wm_method = wm_method, - wm_name = wm_name, - node_values = node_values, - weight_matrix = weight_matrix, - verbose = verbose) - # unpack formatted input - use_values = data_list$use_values - feats = data_list$feats - weight_matrix = data_list$weight_matrix - provenance = data_list$provenance - values = data_list$expr_values - IDs = data_list$IDs - - # spatIDs to use when returning autocor results - # Provide default spatIDs if missing - if(is.null(IDs)) { - IDs = seq(nrow(use_values)) - } - - # 2. perform autocor - res_dt = .run_spat_autocor_local(use_values = use_values, - feats = feats, - weight_matrix = weight_matrix, - method = method, - test_method = test_method, - IDs = IDs) - - # create spatial enrichment object - enr = create_spat_enr_obj(name = enrich_name, - method = method_select, - enrichDT = res_dt, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = provenance, - misc = if(use_expr) list(expr_values_used = values)) - - - # return info - if(isTRUE(return_gobject)) { - - if(isTRUE(verbose)) wrap_msg('Attaching ', method_select, ' results as spatial enrichment: "', - enrich_name, '"', sep = '') - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_spatial_enrichment(gobject = gobject, - spatenrichment = enr) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - return(gobject) - } else { - if(output == 'spatEnrObj') return(enr) - if(output == 'data.table') return(res_dt) - } +spatialAutoCorLocal <- function(gobject = NULL, + spat_unit = NULL, + feat_type = NULL, + feats = NULL, + method = c("moran", "gi", "gi*", "mean"), + data_to_use = c("expression", "cell_meta"), + expression_values = c("normalized", "scaled", "custom"), + meta_cols = NULL, + spatial_network_to_use = "kNN_network", + wm_method = c("distance", "adjacency"), + wm_name = "spat_weights", + node_values = NULL, + weight_matrix = NULL, + test_method = c("none"), + # cor_name = NULL, + enrich_name = NULL, + return_gobject = TRUE, + output = c("spatEnrObj", "data.table"), + verbose = TRUE) { + # 0. determine inputs + method_select <- match.arg( + method, choices = c("moran", "gi", "gi*", "mean")) + data_to_use <- match.arg( + data_to_use, choices = c("expression", "cell_meta")) + output <- match.arg(output, choices = c("spatEnrObj", "data.table")) + # if(is.null(cor_name)) cor_name = method + + if (method_select == "moran") { + method <- "locmor" + } else { + method <- method_select + } -} + if (!is.null(node_values)) { + if (is.numeric(node_values)) { + stop(wrap_txt('External "node_values" must be type numeric', + errWidth = TRUE + )) + } + } + use_ext_vals <- data.table::fifelse( + !is.null(node_values), yes = TRUE, no = FALSE) + use_sn <- data.table::fifelse( + !is.null(weight_matrix), yes = FALSE, no = TRUE) + use_expr <- data.table::fcase( + isTRUE(use_ext_vals), FALSE, + data_to_use != "expression", FALSE, + default = TRUE + ) + use_meta <- data.table::fcase( + isTRUE(use_ext_vals), FALSE, + data_to_use != "cell_meta", FALSE, + default = TRUE + ) + if (is.null(enrich_name)) { # name of spatEnrObj + enrich_name <- data.table::fcase( + isTRUE(use_ext_vals), method_select, + data_to_use == "expression", paste0("expr_", method_select), + data_to_use == "cell_meta", paste0("meta_", method_select), + default = method_select + ) + } -#' .run_spat_autocor_global -#' -#' @keywords internal -.run_spat_autocor_global = function(use_values, - feats, - weight_matrix, - method, - test_method, - mc_nsim, - cor_name) { - # data.table vars - cell_ID = nsim = NULL - - nfeats = length(feats) - if(test_method != 'none') step_size = ceiling(nfeats/100L) - else step_size = step_size = ceiling(nfeats/10L) - - progressr::with_progress({ - if(step_size > 1) pb = progressr::progressor(steps = nfeats/step_size) - res_list = lapply_flex( - seq_along(feats), - # future.packages = c('terra', 'data.table'), - function(feat_i) { - feat = feats[feat_i] - if(inherits(use_values, 'data.table')) { - feat_vals = eval(call('[', use_values, j = as.name(feat))) - } else { - feat_vals = use_values[, feat] + + # 1. setup + if (!is.null(gobject)) { + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + } else { # if null + if (any(!use_ext_vals, use_sn, return_gobject)) { + stop("gobject has not been provided") } + } - spat_ac = .spat_autocor_terra_numeric( - x = feat_vals, - w = weight_matrix, - method = method) - - - # test - if(test_method != 'none') { - if(test_method == 'monte_carlo') { - mc = sapply(seq(mc_nsim), function(i) .spat_autocor_terra_numeric( - x = sample(feat_vals), - w = weight_matrix, - method = method)) - P = 1 - sum((spat_ac > mc) / (nsim + 1)) - } - if(test_method == 'spdep') { - wrap_msg('spdep not yet implemented') - } - } - # increment progress - if(exists('pb')) if(feat_i %% step_size == 0) pb() + # select and format input + data_list <- .evaluate_autocor_input( + gobject = gobject, + use_ext_vals = use_ext_vals, + use_sn = use_sn, + use_expr = use_expr, + use_meta = use_meta, + spat_unit = spat_unit, + feat_type = feat_type, + feats = feats, + data_to_use = data_to_use, + expression_values = expression_values, + meta_cols = meta_cols, + spatial_network_to_use = spatial_network_to_use, + wm_method = wm_method, + wm_name = wm_name, + node_values = node_values, + weight_matrix = weight_matrix, + verbose = verbose + ) + # unpack formatted input + use_values <- data_list$use_values + feats <- data_list$feats + weight_matrix <- data_list$weight_matrix + provenance <- data_list$provenance + values <- data_list$expr_values + IDs <- data_list$IDs + + # spatIDs to use when returning autocor results + # Provide default spatIDs if missing + if (is.null(IDs)) { + IDs <- seq(nrow(use_values)) + } + # 2. perform autocor + res_dt <- .run_spat_autocor_local( + use_values = use_values, + feats = feats, + weight_matrix = weight_matrix, + method = method, + test_method = test_method, + IDs = IDs + ) - if(test_method == 'none') return(data.table::data.table(feat, spat_ac)) - else return(data.table::data.table(feat, spat_ac, P)) - } + # create spatial enrichment object + enr <- create_spat_enr_obj( + name = enrich_name, + method = method_select, + enrichDT = res_dt, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = provenance, + misc = if (use_expr) list(expr_values_used = values) ) - }) - res_dt = do.call('rbind', res_list) - if(test_method == 'none') colnames(res_dt) = c('feat_ID', cor_name) - else colnames(res_dt) = c('feat_ID', cor_name, paste0(cor_name, '_', test_method)) - return(res_dt) -} -#' .run_spat_autocor_local -#' -#' @keywords internal -.run_spat_autocor_local = function(use_values, - feats, - weight_matrix, - method, - test_method, - IDs) { - - cell_ID = NULL - - nfeats = length(feats) - if(test_method != 'none') step_size = ceiling(nfeats/100L) - else step_size = step_size = ceiling(nfeats/10L) - - progressr::with_progress({ - if(step_size > 1) pb = progressr::progressor(steps = nfeats/step_size) - res_list = lapply_flex( - seq_along(feats), - # future.packages = c('terra', 'data.table'), - function(feat_i) { - feat = feats[feat_i] - if(inherits(use_values, 'data.table')) { - feat_vals = eval(call('[', use_values, j = as.name(feat))) - } else { - feat_vals = use_values[, feat] + # return info + if (isTRUE(return_gobject)) { + if (isTRUE(verbose)) { + wrap_msg("Attaching ", method_select, + ' results as spatial enrichment: "', + enrich_name, '"', + sep = "" + ) } + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_spatial_enrichment( + gobject = gobject, + spatenrichment = enr + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + return(gobject) + } else { + if (output == "spatEnrObj") { + return(enr) + } + if (output == "data.table") { + return(res_dt) + } + } +} - spat_ac = .spat_autocor_terra_numeric( - x = feat_vals, - w = weight_matrix, - method = method) - # test - # if(test_method != 'none') { - # } - # increment progress - if(exists('pb')) if(feat_i %% step_size == 0) pb() - out_dt = data.table::data.table(spat_ac) - colnames(out_dt) = feat - return(out_dt) - } - ) - }) - res_dt = do.call('cbind', res_list) - # append cell_ID column - res_dt[, cell_ID := IDs] - return(res_dt) +#' .run_spat_autocor_global +#' +#' @keywords internal +.run_spat_autocor_global <- function(use_values, + feats, + weight_matrix, + method, + test_method, + mc_nsim, + cor_name) { + # data.table vars + cell_ID <- nsim <- NULL + + nfeats <- length(feats) + if (test_method != "none") { + step_size <- ceiling(nfeats / 100L) + } else { + step_size <- step_size <- ceiling(nfeats / 10L) + } + + progressr::with_progress({ + if (step_size > 1) pb <- progressr::progressor( + steps = nfeats / step_size) + res_list <- lapply_flex( + seq_along(feats), + # future.packages = c('terra', 'data.table'), + function(feat_i) { + feat <- feats[feat_i] + if (inherits(use_values, "data.table")) { + feat_vals <- eval(call("[", use_values, j = as.name(feat))) + } else { + feat_vals <- use_values[, feat] + } + + + spat_ac <- .spat_autocor_terra_numeric( + x = feat_vals, + w = weight_matrix, + method = method + ) + + + # test + if (test_method != "none") { + if (test_method == "monte_carlo") { + mc <- sapply(seq(mc_nsim), function(i) { + .spat_autocor_terra_numeric( + x = sample(feat_vals), + w = weight_matrix, + method = method + ) + }) + P <- 1 - sum((spat_ac > mc) / (nsim + 1)) + } + if (test_method == "spdep") { + wrap_msg("spdep not yet implemented") + } + } + # increment progress + if (exists("pb")) if (feat_i %% step_size == 0) pb() + + + if (test_method == "none") { + return(data.table::data.table(feat, spat_ac)) + } else { + return(data.table::data.table(feat, spat_ac, P)) + } + } + ) + }) + res_dt <- do.call("rbind", res_list) + if (test_method == "none") { + colnames(res_dt) <- c("feat_ID", cor_name) + } else { + colnames(res_dt) <- c("feat_ID", cor_name, paste0( + cor_name, "_", test_method)) + } + return(res_dt) +} + +#' .run_spat_autocor_local +#' +#' @keywords internal +.run_spat_autocor_local <- function(use_values, + feats, + weight_matrix, + method, + test_method, + IDs) { + cell_ID <- NULL + + nfeats <- length(feats) + if (test_method != "none") { + step_size <- ceiling(nfeats / 100L) + } else { + step_size <- step_size <- ceiling(nfeats / 10L) + } + + progressr::with_progress({ + if (step_size > 1) pb <- progressr::progressor( + steps = nfeats / step_size) + res_list <- lapply_flex( + seq_along(feats), + # future.packages = c('terra', 'data.table'), + function(feat_i) { + feat <- feats[feat_i] + if (inherits(use_values, "data.table")) { + feat_vals <- eval(call("[", use_values, j = as.name(feat))) + } else { + feat_vals <- use_values[, feat] + } + + spat_ac <- .spat_autocor_terra_numeric( + x = feat_vals, + w = weight_matrix, + method = method + ) + + # increment progress + if (exists("pb")) if (feat_i %% step_size == 0) pb() + + + out_dt <- data.table::data.table(spat_ac) + colnames(out_dt) <- feat + return(out_dt) + } + ) + }) + res_dt <- do.call("cbind", res_list) + # append cell_ID column + res_dt[, cell_ID := IDs] + return(res_dt) } # Determine which information to retrieve and how to format the information # Vars from upstream: -# use_sn - if true, extracts spatial network from gobject. Otherwise use externally provided info -# use_expr - if true, extracts expression information from gobject to use as node values -# use_meta - if true, extracts cell metadata information from gobject to use as node values +# use_sn - if true, extracts spatial network from gobject. Otherwise use +# externally provided info +# use_expr - if true, extracts expression information from gobject to use as +# node values +# use_meta - if true, extracts cell metadata information from gobject to use as +# node values # use_ext_vals - directly use externally provided node value information # Expected input: # 1. source of data per spatial ID, whether that be expression information, # cell metadata annotations, or external data -# 2. a spatial weight matrix for defining how important spatial interactions should -# be considered. This information can either be extracted spatial networks in the -# gobject with a pre-generated spatial weight matrix or generated during this call. +# 2. a spatial weight matrix for defining how important spatial interactions +# should +# be considered. This information can either be extracted spatial networks in +# the gobject with a pre-generated spatial weight matrix or generated during +# this call. # Expected output: # list of the following... -# 1. use_values - data per spatial ID. Formatted to be spatial ID (rows) by feats (cols) -# 2. feats - character vector of features in use_values to iterate through for autocor -# 3. weight_matrix - weight matrix (ordering checked to match with use_values if possible) +# 1. use_values - data per spatial ID. Formatted to be spatial ID (rows) by +# feats (cols) +# 2. feats - character vector of features in use_values to iterate through for +# autocor +# 3. weight_matrix - weight matrix (ordering checked to match with use_values +# if possible) # 4, IDs - cell_IDs if available -# Some additional information about information used in specific workflows are also returned -.evaluate_autocor_input = function(gobject, - use_ext_vals, - use_sn, - use_expr, - use_meta, - spat_unit, - feat_type, - feats, - data_to_use, - expression_values, - meta_cols, - spatial_network_to_use, - wm_method, - wm_name, - node_values, - weight_matrix, - verbose = TRUE) { - - cell_ID = NULL - - # 1. Get spatial network to either get or generate a spatial weight matrix - # End output is weight_matrix - if(isTRUE(use_sn)) { - #SPATNET=================================================================# - sn = getSpatialNetwork(gobject = gobject, - spat_unit = spat_unit, - name = spatial_network_to_use, - output = 'spatialNetworkObj') - weight_matrix = slot(sn, 'misc')$weight_matrix[[wm_name]] - - # if no weight_matrix already generated... - if(is.null(weight_matrix)) { - wm_method = match.arg(wm_method, choices = c('distance', 'adjacency')) - if(isTRUE(verbose)) wrap_msg( - 'No spatial weight matrix found in selected spatial network - Generating', wm_method, 'matrix from', spatial_network_to_use - ) - weight_matrix = createSpatialWeightMatrix(gobject = gobject, - spat_unit = spat_unit, - spatial_network_to_use = spatial_network_to_use, - wm_name = wm_name, - method = wm_method, - return_gobject = FALSE, - verbose = FALSE) - } - wm_colnames = colnames(weight_matrix) - #SPATNET=================================================================# - } - if(!isTRUE(use_sn)) { - #EXTSPATNET==============================================================# - if(!is.null(colnames(weight_matrix))) { - wm_colnames = colnames(weight_matrix) - if(isTRUE(verbose)) wrap_msg( - 'colnames of externally provided weight matrix will be matched to' - ) - } - #EXTSPATNET==============================================================# - } - - - # 2. Get and format node values for use with autocorrelation function. - # End outputs are: - # - use_values for a spatID (rows) by features (cols) table or matrix - # - feats the names of selected features to use that will be iterated through downstream - if(isTRUE(use_expr)) { - #EXPR====================================================================# - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - use_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - use_values = t_flex(use_values) - - # ensure identical ordering with giotto weight matrix - if(exists('wm_colnames')) use_values = use_values[wm_colnames,] - - if(is.null(feats)) feats = colnames(use_values) - IDs = rownames(use_values) - #EXPR====================================================================# - } - if(isTRUE(use_meta)) { - #META====================================================================# - if(is.null(meta_cols)) stop(wrap_txt('Metadata columns to autocorrelate must be given', - errWidth = TRUE)) - use_values = getCellMetadata( - gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE - ) - - # ensure identical ordering with giotto weight matrix - if(exists('wm_colnames')) { - new_order = data.table::chmatch(wm_colnames, use_values$cell_ID) - GiottoUtils::dt_set_row_order(use_values, new_order) - } - - feats = meta_cols - IDs = use_values[, cell_ID] - #META====================================================================# - } - if(isTRUE(use_ext_vals)) { - #EXTDATA=================================================================# - use_values = data.table::as.data.table(values = node_values) - - feats = 'values' - #EXTDATA=================================================================# - } - - - # 3. general formatting and checking - ## weight matrix type - if(!inherits(weight_matrix, c('Matrix', 'matrix', 'listw', 'nb'))) { - stop(wrap_txt('weight_matrix must be a matrix, Matrix, or listw', - errWidth = TRUE)) - } - - ## check if weight matrix dimensions match use_values - if((nrow(use_values) != ncol(weight_matrix)) | (nrow(use_values) != nrow(weight_matrix))) { - stop(wrap_txt('Number of values to correlate do not match number of weight matrix entries', - errWidth = TRUE)) - } - - - # return formatted values - # provenance included if available - return(list(use_values = use_values, - feats = feats, - weight_matrix = weight_matrix, - # method specific items: - expr_values = if(use_expr) values else NULL, - provenance = if(use_sn) prov(sn) else NULL, - IDs = if(use_expr | use_meta) IDs else NULL)) +# Some additional information about information used in specific workflows are +# also returned +.evaluate_autocor_input <- function(gobject, + use_ext_vals, + use_sn, + use_expr, + use_meta, + spat_unit, + feat_type, + feats, + data_to_use, + expression_values, + meta_cols, + spatial_network_to_use, + wm_method, + wm_name, + node_values, + weight_matrix, + verbose = TRUE) { + cell_ID <- NULL + + # 1. Get spatial network to either get or generate a spatial weight matrix + # End output is weight_matrix + if (isTRUE(use_sn)) { + # SPATNET==============================================================# + sn <- getSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + name = spatial_network_to_use, + output = "spatialNetworkObj" + ) + weight_matrix <- slot(sn, "misc")$weight_matrix[[wm_name]] + + # if no weight_matrix already generated... + if (is.null(weight_matrix)) { + wm_method <- match.arg( + wm_method, choices = c("distance", "adjacency")) + if (isTRUE(verbose)) { + wrap_msg( + "No spatial weight matrix found in selected spatial network + Generating", wm_method, "matrix from", + spatial_network_to_use + ) + } + weight_matrix <- createSpatialWeightMatrix( + gobject = gobject, + spat_unit = spat_unit, + spatial_network_to_use = spatial_network_to_use, + wm_name = wm_name, + method = wm_method, + return_gobject = FALSE, + verbose = FALSE + ) + } + wm_colnames <- colnames(weight_matrix) + # SPATNET==============================================================# + } + if (!isTRUE(use_sn)) { + # EXTSPATNET===========================================================# + if (!is.null(colnames(weight_matrix))) { + wm_colnames <- colnames(weight_matrix) + if (isTRUE(verbose)) { + wrap_msg( + "colnames of externally provided weight matrix will be + matched to" + ) + } + } + # EXTSPATNET===========================================================# + } + + # 2. Get and format node values for use with autocorrelation function. + # End outputs are: + # - use_values for a spatID (rows) by features (cols) table or matrix + # - feats the names of selected features to use that will be iterated through downstream + if (isTRUE(use_expr)) { + # EXPR=================================================================# + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + use_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + use_values <- t_flex(use_values) + + # ensure identical ordering with giotto weight matrix + if (exists("wm_colnames")) use_values <- use_values[wm_colnames, ] + + if (is.null(feats)) feats <- colnames(use_values) + IDs <- rownames(use_values) + # EXPR=================================================================# + } + if (isTRUE(use_meta)) { + # META=================================================================# + if (is.null(meta_cols)) { + stop(wrap_txt("Metadata columns to autocorrelate must be given", + errWidth = TRUE + )) + } + use_values <- getCellMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + + # ensure identical ordering with giotto weight matrix + if (exists("wm_colnames")) { + new_order <- data.table::chmatch(wm_colnames, use_values$cell_ID) + GiottoUtils::dt_set_row_order(use_values, new_order) + } + + feats <- meta_cols + IDs <- use_values[, cell_ID] + # META=================================================================# + } + if (isTRUE(use_ext_vals)) { + # EXTDATA==============================================================# + use_values <- data.table::as.data.table(values = node_values) + + feats <- "values" + # EXTDATA==============================================================# + } + + + # 3. general formatting and checking + ## weight matrix type + if (!inherits(weight_matrix, c("Matrix", "matrix", "listw", "nb"))) { + stop(wrap_txt("weight_matrix must be a matrix, Matrix, or listw", + errWidth = TRUE + )) + } + + ## check if weight matrix dimensions match use_values + if ((nrow(use_values) != ncol(weight_matrix)) | + (nrow(use_values) != nrow(weight_matrix))) { + stop(wrap_txt("Number of values to correlate do not match number of + weight matrix entries", + errWidth = TRUE)) + } + + + # return formatted values + # provenance included if available + return(list( + use_values = use_values, + feats = feats, + weight_matrix = weight_matrix, + # method specific items: + expr_values = if (use_expr) values else NULL, + provenance = if (use_sn) prov(sn) else NULL, + IDs = if (use_expr | use_meta) IDs else NULL + )) } @@ -2095,144 +2292,160 @@ spatialAutoCorLocal = function(gobject = NULL, #' @description Rui to fill in #' @keywords internal enrich_deconvolution <- function(expr, - log_expr, - cluster_info, - ct_exp, - cutoff) { - - #####generate enrich 0/1 matrix based on expression matrix - ct_exp <- ct_exp[rowSums(ct_exp)>0,] - enrich_matrix<-matrix(0,nrow=dim(ct_exp)[1],ncol=dim(ct_exp)[2]) - rowmax_col<-Rfast::rowMaxs(ct_exp) - for (i in seq_along(rowmax_col)){ - enrich_matrix[i,rowmax_col[i]]=1 - } - colsum_ct_binary <- colSums(enrich_matrix) - for (i in seq_along(colsum_ct_binary)){ - if (colsum_ct_binary[i] <= 2){ - rank <- rank(-ct_exp[,i]) - enrich_matrix[rank <=2, i] =1 - } - } - rownames(enrich_matrix)<-rownames(ct_exp) - colnames(enrich_matrix)<-colnames(ct_exp) - - #####page enrich - enrich_result<-enrich_analysis(log_expr,enrich_matrix) - #####initialize dwls matrix - dwls_results<-matrix(0,nrow =dim(enrich_matrix)[2],ncol = dim(expr)[2]) - rownames(dwls_results)<-colnames(enrich_matrix) - colnames(dwls_results)<-colnames(expr) - cluster_sort<-mixedsort(unique(cluster_info)) - cluster_info<-cluster_info - for (i in seq_along(cluster_sort)){ - cluster_i_enrich<-enrich_result[,which(cluster_info==cluster_sort[i])] - row_i_max<-Rfast::rowMaxs(cluster_i_enrich,value = TRUE) - ct<-rownames(enrich_result)[which(row_i_max>cutoff)] - if (length(ct)<2){ - sort_rank<-sort(row_i_max,decreasing = T) - ct<-rownames(enrich_result)[which(row_i_max>=sort_rank[2])] - } - ct_gene<-c() - for (j in seq_along(ct)){ - sig_gene_j<-rownames(enrich_matrix)[which(enrich_matrix[,ct[j]]==1)] - ct_gene<-c(ct_gene,sig_gene_j) - } - uniq_ct_gene<-intersect(rownames(expr),unique(ct_gene)) - select_sig_exp<-ct_exp[uniq_ct_gene,ct] - cluster_i_cell<-which(cluster_info==cluster_sort[i]) - cluster_cell_exp<-expr[uniq_ct_gene,cluster_i_cell] - - cluster_i_dwls<-optimize_deconvolute_dwls(cluster_cell_exp,select_sig_exp) - dwls_results[ct,cluster_i_cell]<-cluster_i_dwls - } - #####remove negative values - for (i in dim(dwls_results)[1]){ - negtive_index<-which(dwls_results[i,]<0) - dwls_results[i,negtive_index]=0 ####* fixed a typo, with "==" the negative values weren't removed - } - return(dwls_results) + log_expr, + cluster_info, + ct_exp, + cutoff) { + ##### generate enrich 0/1 matrix based on expression matrix + ct_exp <- ct_exp[rowSums(ct_exp) > 0, ] + enrich_matrix <- matrix(0, nrow = dim(ct_exp)[1], ncol = dim(ct_exp)[2]) + rowmax_col <- Rfast::rowMaxs(ct_exp) + for (i in seq_along(rowmax_col)) { + enrich_matrix[i, rowmax_col[i]] <- 1 + } + colsum_ct_binary <- colSums(enrich_matrix) + for (i in seq_along(colsum_ct_binary)) { + if (colsum_ct_binary[i] <= 2) { + rank <- rank(-ct_exp[, i]) + enrich_matrix[rank <= 2, i] <- 1 + } + } + rownames(enrich_matrix) <- rownames(ct_exp) + colnames(enrich_matrix) <- colnames(ct_exp) + + ##### page enrich + enrich_result <- enrich_analysis(log_expr, enrich_matrix) + ##### initialize dwls matrix + dwls_results <- matrix(0, nrow = dim(enrich_matrix)[2], ncol = dim(expr)[2]) + rownames(dwls_results) <- colnames(enrich_matrix) + colnames(dwls_results) <- colnames(expr) + cluster_sort <- mixedsort(unique(cluster_info)) + cluster_info <- cluster_info + for (i in seq_along(cluster_sort)) { + cluster_i_enrich <- enrich_result[ + , which(cluster_info == cluster_sort[i])] + row_i_max <- Rfast::rowMaxs(cluster_i_enrich, value = TRUE) + ct <- rownames(enrich_result)[which(row_i_max > cutoff)] + if (length(ct) < 2) { + sort_rank <- sort(row_i_max, decreasing = TRUE) + ct <- rownames(enrich_result)[which(row_i_max >= sort_rank[2])] + } + ct_gene <- c() + for (j in seq_along(ct)) { + sig_gene_j <- rownames(enrich_matrix)[ + which(enrich_matrix[, ct[j]] == 1)] + ct_gene <- c(ct_gene, sig_gene_j) + } + uniq_ct_gene <- intersect(rownames(expr), unique(ct_gene)) + select_sig_exp <- ct_exp[uniq_ct_gene, ct] + cluster_i_cell <- which(cluster_info == cluster_sort[i]) + cluster_cell_exp <- expr[uniq_ct_gene, cluster_i_cell] + + cluster_i_dwls <- optimize_deconvolute_dwls( + cluster_cell_exp, select_sig_exp) + dwls_results[ct, cluster_i_cell] <- cluster_i_dwls + } + ##### remove negative values + for (i in dim(dwls_results)[1]) { + negtive_index <- which(dwls_results[i, ] < 0) + dwls_results[i, negtive_index] <- 0 + ####* fixed a typo, with "==" the negative values weren't removed + } + return(dwls_results) } #' @title spot_deconvolution #' @description Rui to fill in #' @keywords internal -spot_deconvolution<-function(expr, - cluster_info, - ct_exp, - binary_matrix){ - #####generate enrich 0/1 matrix based on expression matrix - enrich_matrix<-matrix(0,nrow=dim(ct_exp)[1],ncol=dim(ct_exp)[2]) - rowmax_col<-Rfast::rowMaxs(ct_exp) - for (i in seq_along(rowmax_col)){ - enrich_matrix[i,rowmax_col[i]]=1 - } - rownames(enrich_matrix)<-rownames(ct_exp) - colnames(enrich_matrix)<-colnames(ct_exp) - - cluster_sort<-mixedsort(unique(cluster_info)) - ####initialize dwls matrix - dwls_results<-matrix(0,nrow =dim(ct_exp)[2],ncol = dim(expr)[2]) - rownames(dwls_results)<-colnames(ct_exp) - colnames(dwls_results)<-colnames(expr) - - for (i in seq_along(cluster_sort)){ - cluster_i_matrix<-binary_matrix[,which(cluster_info==cluster_sort[i])] - row_i_max<-Rfast::rowMaxs(cluster_i_matrix,value = TRUE) - ct_i<-rownames(cluster_i_matrix)[which(row_i_max==1)] - ########calculate proportion based on binarized deconvolution results at first step - if (length(ct_i)==1){ - dwls_results[ct_i[1],which(cluster_info==cluster_sort[i])]==1 - } else { - ct_gene<-c() - for (j in seq_along(ct_i)){ - sig_gene_j<-rownames(enrich_matrix)[which(enrich_matrix[,ct_i[j]]==1)] - ct_gene<-c(ct_gene,sig_gene_j) - } - uniq_ct_gene<-intersect(rownames(expr),unique(ct_gene)) - select_sig_exp<-ct_exp[uniq_ct_gene,ct_i] - cluster_i_cell<-which(cluster_info==cluster_sort[i]) - cluster_cell_exp<-expr[uniq_ct_gene,cluster_i_cell] - ######calculate - ######overlap signature with spatial genes - all_exp<-Matrix::rowMeans(cluster_cell_exp) - solution_all_exp<-solve_OLS_internal(select_sig_exp,all_exp) - constant_J<-find_dampening_constant(select_sig_exp,all_exp,solution_all_exp) - ######deconvolution for each spot - for(k in 1:(dim(cluster_cell_exp)[2])){ - B<-Matrix::as.matrix(cluster_cell_exp[,k]) - ct_spot_k<-rownames(cluster_i_matrix)[which(cluster_i_matrix[,k]==1)] - if (sum(B)==0 || length(ct_spot_k)==0){ ####* must include the case where all genes are 0 - dwls_results[,colnames(cluster_cell_exp)[k]]<-NA ####* will produce NAs for some spots in the output - next; ####* no need to look into this spot any more - } - if (length(ct_spot_k)==1){ - dwls_results[ct_spot_k[1],colnames(cluster_cell_exp)[k]]<-1 +spot_deconvolution <- function(expr, + cluster_info, + ct_exp, + binary_matrix) { + ##### generate enrich 0/1 matrix based on expression matrix + enrich_matrix <- matrix(0, nrow = dim(ct_exp)[1], ncol = dim(ct_exp)[2]) + rowmax_col <- Rfast::rowMaxs(ct_exp) + for (i in seq_along(rowmax_col)) { + enrich_matrix[i, rowmax_col[i]] <- 1 + } + rownames(enrich_matrix) <- rownames(ct_exp) + colnames(enrich_matrix) <- colnames(ct_exp) + + cluster_sort <- mixedsort(unique(cluster_info)) + #### initialize dwls matrix + dwls_results <- matrix(0, nrow = dim(ct_exp)[2], ncol = dim(expr)[2]) + rownames(dwls_results) <- colnames(ct_exp) + colnames(dwls_results) <- colnames(expr) + + for (i in seq_along(cluster_sort)) { + cluster_i_matrix <- binary_matrix[ + , which(cluster_info == cluster_sort[i])] + row_i_max <- Rfast::rowMaxs(cluster_i_matrix, value = TRUE) + ct_i <- rownames(cluster_i_matrix)[which(row_i_max == 1)] + ######## calculate proportion based on binarized deconvolution + ######## results at first step + if (length(ct_i) == 1) { + dwls_results[ct_i[1], which(cluster_info == cluster_sort[i])] == 1 } else { - ct_k_gene<-c() - for (m in seq_along(ct_spot_k)){ - sig_gene_k<-rownames(enrich_matrix)[which(enrich_matrix[,ct_spot_k[m]]==1)] - ct_k_gene<-c(ct_k_gene,sig_gene_k) - } - uniq_ct_k_gene<-intersect(rownames(ct_exp),unique(ct_k_gene)) - S_k<-Matrix::as.matrix(ct_exp[uniq_ct_k_gene,ct_spot_k]) - if (sum(B[uniq_ct_k_gene,])==0){ ####* must include the case all genes are 0 - dwls_results[,colnames(cluster_cell_exp)[k]]<-NA ####* will produce NAs for some spots in the output - } else { - solDWLS<-optimize_solveDampenedWLS(S_k,B[uniq_ct_k_gene,],constant_J) - dwls_results[names(solDWLS),colnames(cluster_cell_exp)[k]]<-solDWLS - } + ct_gene <- c() + for (j in seq_along(ct_i)) { + sig_gene_j <- rownames(enrich_matrix)[ + which(enrich_matrix[, ct_i[j]] == 1)] + ct_gene <- c(ct_gene, sig_gene_j) + } + uniq_ct_gene <- intersect(rownames(expr), unique(ct_gene)) + select_sig_exp <- ct_exp[uniq_ct_gene, ct_i] + cluster_i_cell <- which(cluster_info == cluster_sort[i]) + cluster_cell_exp <- expr[uniq_ct_gene, cluster_i_cell] + ###### calculate + ###### overlap signature with spatial genes + all_exp <- Matrix::rowMeans(cluster_cell_exp) + solution_all_exp <- solve_OLS_internal(select_sig_exp, all_exp) + constant_J <- find_dampening_constant( + select_sig_exp, all_exp, solution_all_exp) + ###### deconvolution for each spot + for (k in 1:(dim(cluster_cell_exp)[2])) { + B <- Matrix::as.matrix(cluster_cell_exp[, k]) + ct_spot_k <- rownames(cluster_i_matrix)[ + which(cluster_i_matrix[, k] == 1)] + if (sum(B) == 0 || length(ct_spot_k) == 0) { + ####* must include the case where all genes are 0 + dwls_results[, colnames(cluster_cell_exp)[k]] <- NA + ####* will produce NAs for some spots in the output + next ####* no need to look into this spot any more + } + if (length(ct_spot_k) == 1) { + dwls_results[ + ct_spot_k[1], colnames(cluster_cell_exp)[k]] <- 1 + } else { + ct_k_gene <- c() + for (m in seq_along(ct_spot_k)) { + sig_gene_k <- rownames(enrich_matrix)[which( + enrich_matrix[, ct_spot_k[m]] == 1)] + ct_k_gene <- c(ct_k_gene, sig_gene_k) + } + uniq_ct_k_gene <- intersect( + rownames(ct_exp), unique(ct_k_gene)) + S_k <- Matrix::as.matrix(ct_exp[uniq_ct_k_gene, ct_spot_k]) + if (sum(B[uniq_ct_k_gene, ]) == 0) { + ####* must include the case all genes are 0 + dwls_results[, colnames(cluster_cell_exp)[k]] <- NA + ####* will produce NAs for some spots in the output + } else { + solDWLS <- optimize_solveDampenedWLS(S_k, B[ + uniq_ct_k_gene, ], constant_J) + dwls_results[names(solDWLS), colnames(cluster_cell_exp)[k]] <- solDWLS + } + } + } } - } - } - } - #####remove negative values - for (i in dim(dwls_results)[1]){ - negtive_index<-which(dwls_results[i,]<0) - dwls_results[i,negtive_index]=0 ####* fixed a typo, with "==" the negative values weren't removed - } - return(dwls_results) + } + ##### remove negative values + for (i in dim(dwls_results)[1]) { + negtive_index <- which(dwls_results[i, ] < 0) + dwls_results[i, negtive_index] <- 0 + } + return(dwls_results) } @@ -2241,20 +2454,23 @@ spot_deconvolution<-function(expr, #' @description Rui to fill in #' @keywords internal cluster_enrich_analysis <- function(exp_matrix, - cluster_info, - enrich_sig_matrix) { - uniq_cluster<-mixedsort(unique(cluster_info)) - if(length(uniq_cluster) == 1) { - stop("Only one cluster identified, need at least two.") - } - cluster_exp<-NULL - for (i in uniq_cluster){ - cluster_exp<-cbind(cluster_exp,(apply(exp_matrix,1,function(y) mean(y[which(cluster_info==i)])))) - } - log_cluster_exp<-log2(cluster_exp+1) - colnames(log_cluster_exp)<-uniq_cluster - cluster_enrich<-enrich_analysis(log_cluster_exp,enrich_sig_matrix) - return(cluster_enrich) + cluster_info, + enrich_sig_matrix) { + uniq_cluster <- mixedsort(unique(cluster_info)) + if (length(uniq_cluster) == 1) { + stop("Only one cluster identified, need at least two.") + } + cluster_exp <- NULL + for (i in uniq_cluster) { + cluster_exp <- cbind( + cluster_exp, + (apply(exp_matrix, 1, + function(y) mean(y[which(cluster_info == i)])))) + } + log_cluster_exp <- log2(cluster_exp + 1) + colnames(log_cluster_exp) <- uniq_cluster + cluster_enrich <- enrich_analysis(log_cluster_exp, enrich_sig_matrix) + return(cluster_enrich) } @@ -2262,40 +2478,40 @@ cluster_enrich_analysis <- function(exp_matrix, #' @description Rui to fill in #' @keywords internal enrich_analysis <- function(expr_values, - sign_matrix) { - - # output enrichment - # only continue with genes present in both datasets - interGene = intersect(rownames(sign_matrix), rownames(expr_values)) - filterSig = sign_matrix[interGene,] - signames = rownames(filterSig)[which(filterSig[,1]==1)] - # calculate mean gene expression - #mean_gene_expr = rowMeans(expr_values) - mean_gene_expr = log2(Matrix::rowMeans(2^expr_values-1, dims = 1)+1) - geneFold = expr_values - mean_gene_expr - # calculate sample/spot mean and sd - cellColMean = apply(geneFold,2, mean) - cellColSd = apply(geneFold,2, stats::sd) - - # get enrichment scores - enrichment = matrix(data=NA,nrow = dim(filterSig)[2],ncol=length(cellColMean)) - for (i in (1:dim(filterSig)[2])){ - signames = rownames(filterSig)[which(filterSig[,i]==1)] - sigColMean = apply(geneFold[signames,],2,mean) - m = length(signames) - vectorX = NULL - for (j in(seq_along(cellColMean))){ - Sm = sigColMean[j] - u = cellColMean[j] - sigma = cellColSd[j] - zscore = (Sm - u)* m^(1/2) / sigma - vectorX = append(vectorX,zscore) - } - enrichment[i,] = vectorX - } - rownames(enrichment) = colnames(filterSig) - colnames(enrichment) = names(cellColMean) - return(enrichment) + sign_matrix) { + # output enrichment + # only continue with genes present in both datasets + interGene <- intersect(rownames(sign_matrix), rownames(expr_values)) + filterSig <- sign_matrix[interGene, ] + signames <- rownames(filterSig)[which(filterSig[, 1] == 1)] + # calculate mean gene expression + # mean_gene_expr = rowMeans(expr_values) + mean_gene_expr <- log2(Matrix::rowMeans(2^expr_values - 1, dims = 1) + 1) + geneFold <- expr_values - mean_gene_expr + # calculate sample/spot mean and sd + cellColMean <- apply(geneFold, 2, mean) + cellColSd <- apply(geneFold, 2, stats::sd) + + # get enrichment scores + enrichment <- matrix( + data = NA, nrow = dim(filterSig)[2], ncol = length(cellColMean)) + for (i in (1:dim(filterSig)[2])) { + signames <- rownames(filterSig)[which(filterSig[, i] == 1)] + sigColMean <- apply(geneFold[signames, ], 2, mean) + m <- length(signames) + vectorX <- NULL + for (j in (seq_along(cellColMean))) { + Sm <- sigColMean[j] + u <- cellColMean[j] + sigma <- cellColSd[j] + zscore <- (Sm - u) * m^(1 / 2) / sigma + vectorX <- append(vectorX, zscore) + } + enrichment[i, ] <- vectorX + } + rownames(enrichment) <- colnames(filterSig) + colnames(enrichment) <- names(cellColMean) + return(enrichment) } @@ -2304,113 +2520,119 @@ enrich_analysis <- function(expr_values, #' @description Rui to fill in #' @keywords internal optimize_deconvolute_dwls <- function(exp, - Signature) { - ######overlap signature with spatial genes - Genes<-intersect(rownames(Signature),rownames(exp)) - S<-Signature[Genes,] - S<-Matrix::as.matrix(S) - Bulk<-Matrix::as.matrix(exp) - subBulk = Bulk[Genes,] - allCounts_DWLS<-NULL - all_exp<-Matrix::rowMeans(exp) - - solution_all_exp<-solve_OLS_internal(S,all_exp[Genes]) - - constant_J<-find_dampening_constant(S,all_exp[Genes],solution_all_exp) - for(j in 1:(dim(subBulk)[2])){ - B<-subBulk[,j] - if (sum(B)>0){ - solDWLS<-optimize_solveDampenedWLS(S,B,constant_J) - } else{ - # solDWLS <- rep(0, length(B)) ####* wrong dimension, causes warnings - solDWLS <- rep(0, ncol(S)) ####* corrected dim - # names(solDWLS) <- names(B) ####* wrong dim name - names(solDWLS) <- colnames(S) ####* corrected dim name - } - allCounts_DWLS<-cbind(allCounts_DWLS,solDWLS) - } - colnames(allCounts_DWLS)<-colnames(exp) - return(allCounts_DWLS) + Signature) { + ###### overlap signature with spatial genes + Genes <- intersect(rownames(Signature), rownames(exp)) + S <- Signature[Genes, ] + S <- Matrix::as.matrix(S) + Bulk <- Matrix::as.matrix(exp) + subBulk <- Bulk[Genes, ] + allCounts_DWLS <- NULL + all_exp <- Matrix::rowMeans(exp) + + solution_all_exp <- solve_OLS_internal(S, all_exp[Genes]) + + constant_J <- find_dampening_constant(S, all_exp[Genes], solution_all_exp) + for (j in 1:(dim(subBulk)[2])) { + B <- subBulk[, j] + if (sum(B) > 0) { + solDWLS <- optimize_solveDampenedWLS(S, B, constant_J) + } else { + # solDWLS <- rep(0, length(B)) ####* wrong dimension, + # causes warnings + solDWLS <- rep(0, ncol(S)) ####* corrected dim + # names(solDWLS) <- names(B) ####* wrong dim name + names(solDWLS) <- colnames(S) ####* corrected dim name + } + allCounts_DWLS <- cbind(allCounts_DWLS, solDWLS) + } + colnames(allCounts_DWLS) <- colnames(exp) + return(allCounts_DWLS) } #' @title optimize_solveDampenedWLS #' @description Rui to fill in #' @keywords internal -optimize_solveDampenedWLS<-function(S, - B, - constant_J){ - #first solve OLS, use this solution to find a starting point for the weights - solution = solve_OLS_internal(S,B) - #now use dampened WLS, iterate weights until convergence - iterations = 0 - changes = c() - #find dampening constant for weights using cross-validation - j = constant_J - change = 1 - - while(change > .01 & iterations < 1000){ - newsolution = solve_dampened_WLSj(S, B, solution, j) - #decrease step size for convergence - solutionAverage = rowMeans(cbind(newsolution, - matrix(solution, - nrow = length(solution), - ncol = 4))) - change = norm(Matrix::as.matrix(solutionAverage-solution)) - solution = solutionAverage - iterations = iterations+1 - changes = c(changes, change) - } - - - return(solution/sum(solution)) +optimize_solveDampenedWLS <- function(S, + B, + constant_J) { + # first solve OLS, use this solution to find a starting point for the + # weights + solution <- solve_OLS_internal(S, B) + # now use dampened WLS, iterate weights until convergence + iterations <- 0 + changes <- c() + # find dampening constant for weights using cross-validation + j <- constant_J + change <- 1 + + while (change > .01 & iterations < 1000) { + newsolution <- solve_dampened_WLSj(S, B, solution, j) + # decrease step size for convergence + solutionAverage <- rowMeans(cbind( + newsolution, + matrix(solution, + nrow = length(solution), + ncol = 4 + ) + )) + change <- norm(Matrix::as.matrix(solutionAverage - solution)) + solution <- solutionAverage + iterations <- iterations + 1 + changes <- c(changes, change) + } + + + return(solution / sum(solution)) } #' @title find_dampening_constant #' @description find a dampening constant for the weights using cross-validation #' @keywords internal -find_dampening_constant<-function(S, - B, - goldStandard){ - - - solutionsSd = NULL - - #goldStandard is used to define the weights - sol = goldStandard - ws = as.vector((1/(S%*%sol))^2) - wsScaled = ws/min(ws) - wsScaledMinusInf = wsScaled - - #ignore infinite weights - if(max(wsScaled) == "Inf"){ - wsScaledMinusInf = wsScaled[-which(wsScaled == "Inf")] - } - - #try multiple values of the dampening constant (multiplier) - #for each, calculate the variance of the dampened weighted solution for a subset of genes - for (j in 1:ceiling(log2(max(wsScaledMinusInf)))){ - multiplier = 1*2^(j-1) - wsDampened = wsScaled - wsDampened[which(wsScaled > multiplier)] = multiplier - solutions = NULL - seeds = c(1:100) - for (i in 1:100){ - set.seed(seeds[i]) #make nondeterministic - subset = sample(length(ws), size=length(ws) * 0.5) #randomly select half of gene set - - #solve dampened weighted least squares for subset - fit = stats::lm (B[subset] ~ -1+S[subset,], weights = wsDampened[subset]) - sol = fit$coef * sum(goldStandard) / sum(fit$coef) - solutions = cbind(solutions, sol) - } - solutionsSd = cbind(solutionsSd, apply(solutions, 1, stats::sd)) - } - - #choose dampening constant that results in least cross-validation variance - j = which.min(colMeans(solutionsSd^2)) - return(j) +find_dampening_constant <- function(S, + B, + goldStandard) { + solutionsSd <- NULL + + # goldStandard is used to define the weights + sol <- goldStandard + ws <- as.vector((1 / (S %*% sol))^2) + wsScaled <- ws / min(ws) + wsScaledMinusInf <- wsScaled + + # ignore infinite weights + if (max(wsScaled) == "Inf") { + wsScaledMinusInf <- wsScaled[-which(wsScaled == "Inf")] + } + + # try multiple values of the dampening constant (multiplier) + # for each, calculate the variance of the dampened weighted solution for + # a subset of genes + for (j in 1:ceiling(log2(max(wsScaledMinusInf)))) { + multiplier <- 1 * 2^(j - 1) + wsDampened <- wsScaled + wsDampened[which(wsScaled > multiplier)] <- multiplier + solutions <- NULL + seeds <- c(1:100) + for (i in 1:100) { + set.seed(seeds[i]) # make nondeterministic + subset <- sample(length(ws), size = length(ws) * 0.5) + # randomly select half of gene set + + # solve dampened weighted least squares for subset + fit <- stats::lm( + B[subset] ~ -1 + S[subset, ], weights = wsDampened[subset]) + sol <- fit$coef * sum(goldStandard) / sum(fit$coef) + solutions <- cbind(solutions, sol) + } + solutionsSd <- cbind(solutionsSd, apply(solutions, 1, stats::sd)) + } + + # choose dampening constant that results in least cross-validation variance + j <- which.min(colMeans(solutionsSd^2)) + return(j) } @@ -2418,60 +2640,64 @@ find_dampening_constant<-function(S, #' @description basic functions for dwls #' @keywords internal solve_OLS_internal <- function(S, - B){ - D = t(S)%*%S - d = t(S)%*%B - A = cbind(diag(dim(S)[2])) - bzero = c(rep(0,dim(S)[2])) - - - out = tryCatch( - expr = {quadprog::solve.QP(Dmat = D, - dvec = d, - Amat = A, - bvec = bzero)$solution}, - - error = function(cond) { - message('Original error message: \n') - message(cond) - - message('\n Will try to fix error with Matrix::nearPD()') - return(NULL) - } - ) - - - if(is.null(out)) { - - D = Matrix::nearPD(D) - D = as.matrix(D$mat) - - out = tryCatch( - expr = {quadprog::solve.QP(Dmat = D, - dvec = d, - Amat = A, - bvec = bzero)$solution}, - - error = function(cond) { - message('Original error message: \n') - message(cond) - - message('\n nearPD() did not fix the error') - return(NULL) - } + B) { + D <- t(S) %*% S + d <- t(S) %*% B + A <- cbind(diag(dim(S)[2])) + bzero <- c(rep(0, dim(S)[2])) + + + out <- tryCatch( + expr = { + quadprog::solve.QP( + Dmat = D, + dvec = d, + Amat = A, + bvec = bzero + )$solution + }, + error = function(cond) { + message("Original error message:") + message(cond) + + message("Will try to fix error with Matrix::nearPD()") + return(NULL) + } ) - if(is.null(out)) { - stop('Errors could not be fixed') + + if (is.null(out)) { + D <- Matrix::nearPD(D) + D <- as.matrix(D$mat) + + out <- tryCatch( + expr = { + quadprog::solve.QP( + Dmat = D, + dvec = d, + Amat = A, + bvec = bzero + )$solution + }, + error = function(cond) { + message("Original error message:") + message(cond) + + message("nearPD() did not fix the error") + return(NULL) + } + ) + + if (is.null(out)) { + stop("Errors could not be fixed") + } else { + names(out) <- colnames(S) + } } else { - names(out) = colnames(S) + names(out) <- colnames(S) } - } else { - names(out) = colnames(S) - } - - return(out) + return(out) } # @@ -2479,36 +2705,39 @@ solve_OLS_internal <- function(S, #' @description solve WLS given a dampening constant #' @keywords internal solve_dampened_WLSj <- function(S, - B, - goldStandard, - j){ - multiplier = 1*2^(j-1) - sol = goldStandard - ws = as.vector((1/(S%*%sol))^2) - wsScaled = ws/min(ws) - wsDampened = wsScaled - wsDampened[which(wsScaled > multiplier)] = multiplier - W = diag(wsDampened) - D = t(S)%*%W%*%S - d = t(S)%*%W%*%B - A = cbind(diag(dim(S)[2])) - bzero = c(rep(0,dim(S)[2])) - sc = norm(D,"2") - - D_positive_definite <- Matrix::nearPD(x = D/sc) - - solution <- quadprog::solve.QP(Dmat = as.matrix(D_positive_definite$mat), - dvec = d/sc, - Amat = A, - bvec = bzero)$solution - - names(solution) = colnames(S) - return(solution) + B, + goldStandard, + j) { + multiplier <- 1 * 2^(j - 1) + sol <- goldStandard + ws <- as.vector((1 / (S %*% sol))^2) + wsScaled <- ws / min(ws) + wsDampened <- wsScaled + wsDampened[which(wsScaled > multiplier)] <- multiplier + W <- diag(wsDampened) + D <- t(S) %*% W %*% S + d <- t(S) %*% W %*% B + A <- cbind(diag(dim(S)[2])) + bzero <- c(rep(0, dim(S)[2])) + sc <- norm(D, "2") + + D_positive_definite <- Matrix::nearPD(x = D / sc) + + solution <- quadprog::solve.QP( + Dmat = as.matrix(D_positive_definite$mat), + dvec = d / sc, + Amat = A, + bvec = bzero + )$solution + + names(solution) <- colnames(S) + return(solution) } #' @title runDWLSDeconv -#' @description Function to perform DWLS deconvolution based on single cell expression data +#' @description Function to perform DWLS deconvolution based on single cell +#' expression data #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -2521,144 +2750,160 @@ solve_dampened_WLSj <- function(S, #' @param name name to give to spatial deconvolution results, default = DWLS #' @param return_gobject return giotto object #' @return giotto object or deconvolution results -#' @seealso \url{https://github.com/dtsoucas/DWLS} for the \emph{DWLS} bulk deconvolution method, -#' and \doi{10.1186/s13059-021-02362-7} for \emph{spatialDWLS}, the spatial implementation used here. +#' @seealso \url{https://github.com/dtsoucas/DWLS} for the \emph{DWLS} bulk +#' deconvolution method, and \doi{10.1186/s13059-021-02362-7} for +#' \emph{spatialDWLS}, the spatial implementation used here. #' @export -runDWLSDeconv = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized'), - logbase = 2, - cluster_column = 'leiden_clus', - sign_matrix, - n_cell = 50, - cutoff = 2, - name = NULL, - return_gobject = TRUE) { - - - # verify if optional package is installed - package_check(pkg_name = "quadprog", repository = "CRAN") - package_check(pkg_name = "Rfast", repository = "CRAN") - - ## check parameters ## - if(is.null(name)) name = 'DWLS' - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'exprObj') - - # if(!'matrix' %in% class(expr_values[])) { - # warning('this matrix will be converted to a dense and memory intensive base matrix ...') - # expr_values[] = as.matrix(expr_values[]) - # } - - # #transform expression data to no log data - nolog_expr = logbase^(expr_values[])-1 - - # cluster column - cell_metadata = pDataDT(gobject, - spat_unit = spat_unit, - feat_type = feat_type) - if(!cluster_column %in% colnames(cell_metadata)) { - stop('\n cluster column not found \n') - } - cluster = cell_metadata[[cluster_column]] - - - #####getting overlapped gene lists - sign_matrix = as.matrix(sign_matrix) - intersect_gene = intersect(rownames(sign_matrix), rownames(nolog_expr)) - filter_Sig = sign_matrix[intersect_gene,] - filter_expr = nolog_expr[intersect_gene,] - filter_log_expr = expr_values[][intersect_gene,] - - #####first round spatial deconvolution ##spot or cluster - enrich_spot_proportion = enrich_deconvolution(expr = filter_expr, - log_expr = filter_log_expr, - cluster_info = cluster, - ct_exp = filter_Sig, - cutoff = cutoff) - - ####re-deconvolution based on spatial resolution - resolution = (1/n_cell) - binarize_proportion = ifelse(enrich_spot_proportion >= resolution, 1, 0) - spot_proportion <- spot_deconvolution(expr = filter_expr, - cluster_info = cluster, - ct_exp = filter_Sig, - binary_matrix = binarize_proportion) - deconvolutionDT = data.table::data.table(cell_ID = colnames(spot_proportion)) - deconvolutionDT = cbind(deconvolutionDT, data.table::as.data.table(t(spot_proportion))) - - # create spatial enrichment object - enrObj = create_spat_enr_obj(name = name, - method = 'DWLS', - enrichDT = deconvolutionDT, - spat_unit = spat_unit, - feat_type = feat_type, - provenance = expr_values@provenance, - misc = list(expr_values_used = expression_values, - logbase = logbase, - cluster_column_used = cluster_column, - number_of_cells_per_spot = n_cell, - used_cut_off = cutoff)) - - ## return object or results ## - if(return_gobject == TRUE) { - - spenr_names = list_spatial_enrichments_names(gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - if(name %in% spenr_names) { - cat('\n ', name, ' has already been used, will be overwritten \n') - } - - ## update parameters used ## - parameters_list = gobject@parameters - number_of_rounds = length(parameters_list) - update_name = paste0(number_of_rounds,'_spatial_deconvolution') - - # parameters to include - parameters_list[[update_name]] = c('method used' = 'DWLS', - 'deconvolution name' = name, - 'expression values' = expression_values, - 'logbase' = logbase, - 'cluster column used' = cluster_column, - 'number of cells per spot' = n_cell, - 'used cut off' = cutoff) - - gobject@parameters = parameters_list - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = set_spatial_enrichment(gobject = gobject, - spatenrichment = enrObj) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - return(gobject) - - } else { - return(enrObj) - } +runDWLSDeconv <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized"), + logbase = 2, + cluster_column = "leiden_clus", + sign_matrix, + n_cell = 50, + cutoff = 2, + name = NULL, + return_gobject = TRUE) { + # verify if optional package is installed + package_check(pkg_name = "quadprog", repository = "CRAN") + package_check(pkg_name = "Rfast", repository = "CRAN") + + ## check parameters ## + if (is.null(name)) name <- "DWLS" + + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "exprObj" + ) + + # #transform expression data to no log data + nolog_expr <- logbase^(expr_values[]) - 1 + + # cluster column + cell_metadata <- pDataDT(gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + if (!cluster_column %in% colnames(cell_metadata)) { + stop("cluster column not found") + } + cluster <- cell_metadata[[cluster_column]] + + + ##### getting overlapped gene lists + sign_matrix <- as.matrix(sign_matrix) + intersect_gene <- intersect(rownames(sign_matrix), rownames(nolog_expr)) + filter_Sig <- sign_matrix[intersect_gene, ] + filter_expr <- nolog_expr[intersect_gene, ] + filter_log_expr <- expr_values[][intersect_gene, ] + + ##### first round spatial deconvolution ##spot or cluster + enrich_spot_proportion <- enrich_deconvolution( + expr = filter_expr, + log_expr = filter_log_expr, + cluster_info = cluster, + ct_exp = filter_Sig, + cutoff = cutoff + ) + + #### re-deconvolution based on spatial resolution + resolution <- (1 / n_cell) + binarize_proportion <- ifelse(enrich_spot_proportion >= resolution, 1, 0) + spot_proportion <- spot_deconvolution( + expr = filter_expr, + cluster_info = cluster, + ct_exp = filter_Sig, + binary_matrix = binarize_proportion + ) + deconvolutionDT <- data.table::data.table( + cell_ID = colnames(spot_proportion)) + deconvolutionDT <- cbind( + deconvolutionDT, data.table::as.data.table(t(spot_proportion))) + + # create spatial enrichment object + enrObj <- create_spat_enr_obj( + name = name, + method = "DWLS", + enrichDT = deconvolutionDT, + spat_unit = spat_unit, + feat_type = feat_type, + provenance = expr_values@provenance, + misc = list( + expr_values_used = expression_values, + logbase = logbase, + cluster_column_used = cluster_column, + number_of_cells_per_spot = n_cell, + used_cut_off = cutoff + ) + ) + + ## return object or results ## + if (return_gobject == TRUE) { + spenr_names <- list_spatial_enrichments_names(gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + if (name %in% spenr_names) { + cat(name, " has already been used, will be overwritten") + } + + ## update parameters used ## + parameters_list <- gobject@parameters + number_of_rounds <- length(parameters_list) + update_name <- paste0(number_of_rounds, "_spatial_deconvolution") + + # parameters to include + parameters_list[[update_name]] <- c( + "method used" = "DWLS", + "deconvolution name" = name, + "expression values" = expression_values, + "logbase" = logbase, + "cluster column used" = cluster_column, + "number of cells per spot" = n_cell, + "used cut off" = cutoff + ) + + gobject@parameters <- parameters_list + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_spatial_enrichment( + gobject = gobject, + spatenrichment = enrObj + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + return(gobject) + } else { + return(enrObj) + } } #' @title runSpatialDeconv #' @name runSpatialDeconv -#' @description Function to perform deconvolution based on single cell expression data +#' @description Function to perform deconvolution based on single cell +#' expression data #' @param gobject giotto object #' @param spat_unit spatial unit #' @param feat_type feature type @@ -2675,38 +2920,35 @@ runDWLSDeconv = function(gobject, #' @seealso \code{\link{runDWLSDeconv}} #' @export runSpatialDeconv <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - deconv_method = c('DWLS'), - expression_values = c('normalized'), - logbase = 2, - cluster_column = 'leiden_clus', - sign_matrix, - n_cell = 50, - cutoff = 2, - name = NULL, - return_gobject = TRUE) { - - - deconv_method = match.arg(deconv_method, choices = c('DWLS')) - - - if(deconv_method == 'DWLS') { - - results = runDWLSDeconv(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expression_values = expression_values, - logbase = logbase, - cluster_column = cluster_column, - sign_matrix = sign_matrix, - n_cell = n_cell, - cutoff = cutoff, - name = name, - return_gobject = return_gobject) - } - - return(results) + spat_unit = NULL, + feat_type = NULL, + deconv_method = c("DWLS"), + expression_values = c("normalized"), + logbase = 2, + cluster_column = "leiden_clus", + sign_matrix, + n_cell = 50, + cutoff = 2, + name = NULL, + return_gobject = TRUE) { + deconv_method <- match.arg(deconv_method, choices = c("DWLS")) + + + if (deconv_method == "DWLS") { + results <- runDWLSDeconv( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expression_values = expression_values, + logbase = logbase, + cluster_column = cluster_column, + sign_matrix = sign_matrix, + n_cell = n_cell, + cutoff = cutoff, + name = name, + return_gobject = return_gobject + ) + } + return(results) } - diff --git a/R/spatial_enrichment_visuals.R b/R/spatial_enrichment_visuals.R index 7ddfe2186..8dffa9860 100644 --- a/R/spatial_enrichment_visuals.R +++ b/R/spatial_enrichment_visuals.R @@ -1,9 +1,9 @@ - #' @title findCellTypesFromEnrichment #' @name findCellTypesFromEnrichment #' @param gobject Giotto Object #' @param spat_unit spatial unit in which the enrichment information is stored -#' @param feat_type feature type for which the enrichment information was calculated +#' @param feat_type feature type for which the enrichment information was +#' calculated #' @param enrichment_name name of the spatial enrichment #' i.e. output from GiottoClass::list_spatial_enrichment_names() #' Default value is "PAGE_Z_score" @@ -24,52 +24,61 @@ #' #' @export findCellTypesFromEnrichment <- function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL, - enrichment_name = "PAGE_z_score", - return_frequency_table = FALSE){ + spat_unit = NULL, + feat_type = NULL, + enrichment_name = "PAGE_z_score", + return_frequency_table = FALSE) { # guard clauses - if(!inherits(gobject, "giotto")) stop("gobject needs to be a giotto object") + if (!inherits(gobject, "giotto")) + stop("gobject needs to be a giotto object") - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) # data.table variables - probable_cell_type = cell_ID = NULL + probable_cell_type <- cell_ID <- NULL # extract p-value or z-socre from provided enrichment - pz_enrich = getSpatialEnrichment(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - name = enrichment_name) + pz_enrich <- getSpatialEnrichment(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = enrichment_name + ) - enrich_is_p_value = pz_enrich@misc$p_values_calculated + enrich_is_p_value <- pz_enrich@misc$p_values_calculated - if( colnames(pz_enrich)[[1]] != "cell_ID"){ - selected_cols = colnames(pz_enrich)[colnames(pz_enrich) != "cell_ID"] + if (colnames(pz_enrich)[[1]] != "cell_ID") { + selected_cols <- colnames(pz_enrich)[colnames(pz_enrich) != "cell_ID"] data.table::setcolorder(pz_enrich, c("cell_ID", selected_cols)) } - n_c = ncol(pz_enrich) + n_c <- ncol(pz_enrich) # Find the cell type column that corresponds to the # maximum value within a row and assign it into a # new column, mapping a cell to it's most likely type - if(enrich_is_p_value){ - pz_enrich[, probable_cell_type := names(.SD)[max.col(-.SD)], .SDcols = 2:n_c] + if (enrich_is_p_value) { + pz_enrich[, probable_cell_type := names( + .SD)[max.col(-.SD)], .SDcols = 2:n_c] } else { - pz_enrich[, probable_cell_type := names(.SD)[max.col(.SD)], .SDcols = 2:n_c] + pz_enrich[, probable_cell_type := names( + .SD)[max.col(.SD)], .SDcols = 2:n_c] } - cell_ID_and_types_pz_enrich = pz_enrich[, .(cell_ID, probable_cell_type)] + cell_ID_and_types_pz_enrich <- pz_enrich[, .(cell_ID, probable_cell_type)] - if(return_frequency_table) { - pz_enrich_cell_type_frequencies = table(cell_ID_and_types_pz_enrich$probable_cell_type) + if (return_frequency_table) { + pz_enrich_cell_type_frequencies <- table( + cell_ID_and_types_pz_enrich$probable_cell_type) return(pz_enrich_cell_type_frequencies) } @@ -80,7 +89,8 @@ findCellTypesFromEnrichment <- function(gobject = NULL, #' @name plotCellTypesFromEnrichment #' @param gobject Giotto Object #' @param spat_unit spatial unit in which the enrichment information is stored -#' @param feat_type feature type for which the enrichment information was calculated +#' @param feat_type feature type for which the enrichment information was +#' calculated #' @param enrichment_name name of the spatial enrichment #' i.e. output from GiottoClass::list_spatial_enrichment_names() #' Default value is "PAGE_Z_score" @@ -90,52 +100,60 @@ findCellTypesFromEnrichment <- function(gobject = NULL, #' @details #' #' This function generates a bar plot of cell types vs the frequency -#' of that cell type in the data. These cell type resutls are +#' of that cell type in the data. These cell type results are #' based on the provided `enrichment_name`, and will be determined -#' by the maximum value of the z-score or p-value for a given cell or annotation. +#' by the maximum value of the z-score or p-value for a given cell or +#' annotation. #' #' @export plotCellTypesFromEnrichment <- function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL, - enrichment_name = "PAGE_z_score", - title = NULL, - save_param = list(), - default_save_name = 'cell_types_from_enrichment', - save_plot = NULL, - show_plot = NULL, - return_plot = NULL){ + spat_unit = NULL, + feat_type = NULL, + enrichment_name = "PAGE_z_score", + title = NULL, + save_param = list(), + default_save_name = "cell_types_from_enrichment", + save_plot = NULL, + show_plot = NULL, + return_plot = NULL) { # guard clauses handled at first step downstream # therefore, omitting here. - id_and_types = findCellTypesFromEnrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - enrichment_name = enrichment_name, - return_frequency_table = FALSE) + id_and_types <- findCellTypesFromEnrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + enrichment_name = enrichment_name, + return_frequency_table = FALSE + ) # data.table column - probable_cell_type = NULL + probable_cell_type <- NULL - if(is.null(title)) title = paste0(spat_unit,"cell types (maximum", enrichment_name, ")") + if (is.null(title)) title <- paste0( + spat_unit, "cell types (maximum", enrichment_name, ")") pl <- ggplot2::ggplot(id_and_types, aes(x = probable_cell_type)) + - ggplot2::geom_bar() + - ggplot2::theme(axis.text.x = element_text(angle = 45), - axis.ticks.length.x =unit(1.5, "cm")) + - ggplot2::labs(title = title, - x = "Cell Type", - y = "Frequency") + ggplot2::geom_bar() + + ggplot2::theme( + axis.text.x = element_text(angle = 45), + axis.ticks.length.x = unit(1.5, "cm") + ) + + ggplot2::labs( + title = title, + x = "Cell Type", + y = "Frequency" + ) # output plot return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL )) } @@ -143,7 +161,8 @@ plotCellTypesFromEnrichment <- function(gobject = NULL, #' @name pieCellTypesFromEnrichment #' @param gobject Giotto Object #' @param spat_unit spatial unit in which the enrichment information is stored -#' @param feat_type feature type for which the enrichment information was calculated +#' @param feat_type feature type for which the enrichment information was +#' calculated #' @param enrichment_name name of the spatial enrichment #' i.e. output from GiottoClass::list_spatial_enrichment_names() #' Default value is "PAGE_Z_score" @@ -159,60 +178,69 @@ plotCellTypesFromEnrichment <- function(gobject = NULL, #' #' @export pieCellTypesFromEnrichment <- function(gobject = NULL, - spat_unit = NULL, - feat_type = NULL, - enrichment_name = "PAGE_z_score", - title = NULL, - save_param = list(), - default_save_name = 'cell_types_from_enrichment_pie', - save_plot = NULL, - show_plot = NULL, - return_plot = NULL){ + spat_unit = NULL, + feat_type = NULL, + enrichment_name = "PAGE_z_score", + title = NULL, + save_param = list(), + default_save_name = "cell_types_from_enrichment_pie", + save_plot = NULL, + show_plot = NULL, + return_plot = NULL) { # guard clauses handled one step downstream - freq_table = findCellTypesFromEnrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - enrichment_name = enrichment_name, - return_frequency_table = TRUE) + freq_table <- findCellTypesFromEnrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + enrichment_name = enrichment_name, + return_frequency_table = TRUE + ) - freq_dt = data.table::data.table(freq_table) + freq_dt <- data.table::data.table(freq_table) - colnames(freq_dt) = c("cell_type", "num_cells") + colnames(freq_dt) <- c("cell_type", "num_cells") # data.table vars - cell_type = num_cells = perc = NULL + cell_type <- num_cells <- perc <- NULL - cell_types = unique(freq_dt$cell_type) - total_cells = sum(freq_dt$num_cells) + cell_types <- unique(freq_dt$cell_type) + total_cells <- sum(freq_dt$num_cells) - for ( i in cell_types){ - # hackish, admittedly - nullvar = freq_dt[cell_type == i, perc := num_cells/sum(freq_dt$num_cells) * 100] + for (i in cell_types) { + # hackish, admittedly + nullvar <- freq_dt[cell_type == i, perc := num_cells / sum( + freq_dt$num_cells) * 100] } rm(nullvar) # saves memory - pl = ggplot2::ggplot(as.data.frame(freq_dt), - aes(x = "", - y = perc, - fill = cell_type)) + - geom_bar(stat = "identity", width = 1) + - coord_polar("y", start = 0) + - scale_fill_manual(values = getDistinctColors(length(cell_types))) + - theme_void() + - labs(title = paste(spat_unit, - " Cell Types (", - as.character(total_cells), - " Cells)")) + pl <- ggplot2::ggplot( + as.data.frame(freq_dt), + aes( + x = "", + y = perc, + fill = cell_type + ) + ) + + geom_bar(stat = "identity", width = 1) + + coord_polar("y", start = 0) + + scale_fill_manual(values = getDistinctColors(length(cell_types))) + + theme_void() + + labs(title = paste( + spat_unit, + " Cell Types (", + as.character(total_cells), + " Cells)" + )) # output plot return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL )) } diff --git a/R/spatial_genes.R b/R/spatial_genes.R index 691a70b3f..cd8babc05 100644 --- a/R/spatial_genes.R +++ b/R/spatial_genes.R @@ -1,5 +1,3 @@ - - ## spatial gene detection #### #' @title Fisher exact test @@ -9,140 +7,149 @@ NULL #' @rdname spat_fisher_exact #' @keywords internal -.spat_fish_func = function(feat, - bin_matrix, - spat_mat, - calc_hub = F, - hub_min_int = 3) { - - feat_vector = bin_matrix[rownames(bin_matrix) == feat,] - - feat_vectorA = feat_vector[names(feat_vector) %in% rownames(spat_mat)] - feat_vectorA = feat_vectorA[match(rownames(spat_mat), names(feat_vectorA))] - - feat_vectorB = feat_vector[names(feat_vector) %in% colnames(spat_mat)] - feat_vectorB = feat_vectorB[match(colnames(spat_mat), names(feat_vectorB))] - - test1 = spat_mat*feat_vectorA - test2 = t_flex(t_flex(spat_mat)*feat_vectorB) - - sourcevalues = test1[spat_mat == 1] - targetvalues = test2[spat_mat == 1] +.spat_fish_func <- function(feat, + bin_matrix, + spat_mat, + calc_hub = FALSE, + hub_min_int = 3) { + feat_vector <- bin_matrix[rownames(bin_matrix) == feat, ] - # option 1 - test = paste0(sourcevalues,'-',targetvalues) + feat_vectorA <- feat_vector[names(feat_vector) %in% rownames(spat_mat)] + feat_vectorA <- feat_vectorA[match(rownames(spat_mat), names(feat_vectorA))] + feat_vectorB <- feat_vector[names(feat_vector) %in% colnames(spat_mat)] + feat_vectorB <- feat_vectorB[match(colnames(spat_mat), names(feat_vectorB))] - if(length(unique(test)) < 4) { + test1 <- spat_mat * feat_vectorA + test2 <- t_flex(t_flex(spat_mat) * feat_vectorB) - possibs = c("1-1","0-1","1-0","0-0") - missings_possibs = possibs[!possibs %in% unique(test)] - test = c(test, missings_possibs) + sourcevalues <- test1[spat_mat == 1] + targetvalues <- test2[spat_mat == 1] - table_test = table(test) - table_test[names(table_test) %in% missings_possibs] = 0 - table_matrix = matrix(table_test, byrow = T, nrow = 2) + # option 1 + test <- paste0(sourcevalues, "-", targetvalues) - } else { - table_matrix = matrix(table(test), byrow = T, nrow = 2) - } - if(calc_hub == TRUE) { - high_cells = names(feat_vector[feat_vector == 1]) - subset_spat_mat = spat_mat[rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% high_cells] + if (length(unique(test)) < 4) { + possibs <- c("1-1", "0-1", "1-0", "0-0") + missings_possibs <- possibs[!possibs %in% unique(test)] + test <- c(test, missings_possibs) - if(length(subset_spat_mat) == 1) { - hub_nr = 0 + table_test <- table(test) + table_test[names(table_test) %in% missings_possibs] <- 0 + table_matrix <- matrix(table_test, byrow = TRUE, nrow = 2) } else { - subset_spat_mat = spat_mat[rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% high_cells] - rowhubs = rowSums_flex(subset_spat_mat) - colhubs = colSums_flex(subset_spat_mat) - hub_nr = length(unique(c(names(colhubs[colhubs > hub_min_int]), names(rowhubs[colhubs > hub_min_int])))) + table_matrix <- matrix(table(test), byrow = TRUE, nrow = 2) } - fish_res = stats::fisher.test(table_matrix)[c('p.value','estimate')] - return(c(feats = list(feat), fish_res, hubs = list(hub_nr))) - - } else { - - fish_res = stats::fisher.test(table_matrix)[c('p.value','estimate')] - return(c(feats = list(feat), fish_res)) - } + if (calc_hub == TRUE) { + high_cells <- names(feat_vector[feat_vector == 1]) + subset_spat_mat <- spat_mat[ + rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% + high_cells] + + if (length(subset_spat_mat) == 1) { + hub_nr <- 0 + } else { + subset_spat_mat <- spat_mat[ + rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% + high_cells] + rowhubs <- rowSums_flex(subset_spat_mat) + colhubs <- colSums_flex(subset_spat_mat) + hub_nr <- length(unique(c( + names(colhubs[colhubs > hub_min_int]), + names(rowhubs[colhubs > hub_min_int])))) + } + fish_res <- stats::fisher.test(table_matrix)[c("p.value", "estimate")] + return(c(feats = list(feat), fish_res, hubs = list(hub_nr))) + } else { + fish_res <- stats::fisher.test(table_matrix)[c("p.value", "estimate")] + return(c(feats = list(feat), fish_res)) + } } #' @describeIn spat_fisher_exact data.table implementation #' @keywords internal -.spat_fish_func_dt = function(bin_matrix_DTm, - spat_netw_min, - calc_hub = F, - hub_min_int = 3, - cores = NA) { - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - # data.table variables - from_value = to_value = feat_ID = N = to = from = cell_ID = V1 = NULL - - # get binarized expression values for the neighbors - spatial_network_min_ext = data.table::merge.data.table(spat_netw_min, bin_matrix_DTm, by.x = 'from', by.y = 'variable', allow.cartesian = T) - data.table::setnames(spatial_network_min_ext, 'value', 'from_value') - - spatial_network_min_ext = data.table::merge.data.table(spatial_network_min_ext, by.x = c('to', 'feat_ID'), bin_matrix_DTm, by.y = c('variable', 'feat_ID')) - data.table::setnames(spatial_network_min_ext, 'value', 'to_value') - - - # summarize the different combinations - spatial_network_min_ext[, combn := paste0(from_value,'-',to_value)] - freq_summary = spatial_network_min_ext[, .N, by = .(feat_ID, combn)] - data.table::setorder(freq_summary, feat_ID, combn) - - feats = unique(freq_summary$feat_ID) - all_combn = c('0-0', '0-1', '1-0', '1-1') - - # create a zeroes DT to add missing observations - freq_summary_zeroes = data.table::data.table(feat_ID = rep(feats, each = 4), - combn = rep(all_combn, length(feats)), - N = 0) - freq_summary2 = rbind(freq_summary, freq_summary_zeroes) - freq_summary2[, N := sum(N), by = .(feat_ID, combn)] - freq_summary2 = unique(freq_summary2) - - # sort the combinations and run fisher test - data.table::setorder(freq_summary2, feat_ID, combn, -N) - fish_results = freq_summary2[, stats::fisher.test(matrix(N, nrow = 2))[c(1,3)], by = feat_ID] - +.spat_fish_func_dt <- function(bin_matrix_DTm, + spat_netw_min, + calc_hub = FALSE, + hub_min_int = 3, + cores = NA) { + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) - ## hubs ## - if(calc_hub == TRUE) { + # data.table variables + from_value <- to_value <- feat_ID <- N <- to <- from <- cell_ID <- + V1 <- NULL + + # get binarized expression values for the neighbors + spatial_network_min_ext <- data.table::merge.data.table( + spat_netw_min, + bin_matrix_DTm, + by.x = "from", + by.y = "variable", + allow.cartesian = TRUE) + data.table::setnames(spatial_network_min_ext, "value", "from_value") + + spatial_network_min_ext <- data.table::merge.data.table( + spatial_network_min_ext, by.x = c("to", "feat_ID"), + bin_matrix_DTm, by.y = c("variable", "feat_ID")) + data.table::setnames(spatial_network_min_ext, "value", "to_value") + + + # summarize the different combinations + spatial_network_min_ext[, combn := paste0(from_value, "-", to_value)] + freq_summary <- spatial_network_min_ext[, .N, by = .(feat_ID, combn)] + data.table::setorder(freq_summary, feat_ID, combn) + + feats <- unique(freq_summary$feat_ID) + all_combn <- c("0-0", "0-1", "1-0", "1-1") + + # create a zeroes DT to add missing observations + freq_summary_zeroes <- data.table::data.table( + feat_ID = rep(feats, each = 4), + combn = rep(all_combn, length(feats)), + N = 0 + ) + freq_summary2 <- rbind(freq_summary, freq_summary_zeroes) + freq_summary2[, N := sum(N), by = .(feat_ID, combn)] + freq_summary2 <- unique(freq_summary2) - double_pos = spatial_network_min_ext[combn == '1-1'] + # sort the combinations and run fisher test + data.table::setorder(freq_summary2, feat_ID, combn, -N) + fish_results <- freq_summary2[, stats::fisher.test( + matrix(N, nrow = 2))[c(1, 3)], by = feat_ID] - double_pos_to = double_pos[, .N, by = .(feat_ID, to)] - data.table::setnames(double_pos_to, 'to', 'cell_ID') - double_pos_from = double_pos[, .N, by = .(feat_ID, from)] - data.table::setnames(double_pos_from, 'from', 'cell_ID') - double_pos_both = rbind(double_pos_to, double_pos_from) - double_pos_both = double_pos_both[, sum(N), by = .(feat_ID, cell_ID)] - data.table::setorder(double_pos_both, feat_ID, -V1) + ## hubs ## + if (calc_hub == TRUE) { + double_pos <- spatial_network_min_ext[combn == "1-1"] - # get hubs and add 0's - hub_DT = double_pos_both[V1 > hub_min_int, .N, by = feat_ID] - hub_DT_zeroes = data.table::data.table(feat_ID = unique(spatial_network_min_ext$feat_ID), N = 0) - hub_DT2 = rbind(hub_DT, hub_DT_zeroes) + double_pos_to <- double_pos[, .N, by = .(feat_ID, to)] + data.table::setnames(double_pos_to, "to", "cell_ID") + double_pos_from <- double_pos[, .N, by = .(feat_ID, from)] + data.table::setnames(double_pos_from, "from", "cell_ID") - hub_DT2 = hub_DT2[, sum(N), by = feat_ID] - data.table::setnames(hub_DT2, 'V1', 'hub_nr') + double_pos_both <- rbind(double_pos_to, double_pos_from) + double_pos_both <- double_pos_both[, sum(N), by = .(feat_ID, cell_ID)] + data.table::setorder(double_pos_both, feat_ID, -V1) - fish_results = data.table::merge.data.table(fish_results, hub_DT2, by = 'feat_ID') + # get hubs and add 0's + hub_DT <- double_pos_both[V1 > hub_min_int, .N, by = feat_ID] + hub_DT_zeroes <- data.table::data.table(feat_ID = unique( + spatial_network_min_ext$feat_ID), N = 0) + hub_DT2 <- rbind(hub_DT, hub_DT_zeroes) - } + hub_DT2 <- hub_DT2[, sum(N), by = feat_ID] + data.table::setnames(hub_DT2, "V1", "hub_nr") - return(fish_results) + fish_results <- data.table::merge.data.table( + fish_results, hub_DT2, by = "feat_ID") + } + return(fish_results) } @@ -155,145 +162,153 @@ NULL #' @rdname spat_odds_ratio #' @keywords internal -.spat_or_func = function(feat, - bin_matrix, - spat_mat, - calc_hub = FALSE, - hub_min_int = 3) { - - feat_vector = bin_matrix[rownames(bin_matrix) == feat,] - - feat_vectorA = feat_vector[names(feat_vector) %in% rownames(spat_mat)] - feat_vectorA = feat_vectorA[match(rownames(spat_mat), names(feat_vectorA))] - - feat_vectorB = feat_vector[names(feat_vector) %in% colnames(spat_mat)] - feat_vectorB = feat_vectorB[match(colnames(spat_mat), names(feat_vectorB))] - - test1 = spat_mat*feat_vectorA - test2 = t_flex(t_flex(spat_mat)*feat_vectorB) - - sourcevalues = test1[spat_mat == 1] - targetvalues = test2[spat_mat == 1] - - # option 1 - test = paste0(sourcevalues,'-',targetvalues) +.spat_or_func <- function(feat, + bin_matrix, + spat_mat, + calc_hub = FALSE, + hub_min_int = 3) { + feat_vector <- bin_matrix[rownames(bin_matrix) == feat, ] + feat_vectorA <- feat_vector[names(feat_vector) %in% rownames(spat_mat)] + feat_vectorA <- feat_vectorA[match(rownames(spat_mat), names(feat_vectorA))] - if(length(unique(test)) < 4) { + feat_vectorB <- feat_vector[names(feat_vector) %in% colnames(spat_mat)] + feat_vectorB <- feat_vectorB[match(colnames(spat_mat), names(feat_vectorB))] - possibs = c("1-1","0-1","1-0","0-0") - missings_possibs = possibs[!possibs %in% unique(test)] - test = c(test, missings_possibs) + test1 <- spat_mat * feat_vectorA + test2 <- t_flex(t_flex(spat_mat) * feat_vectorB) - table_test = table(test) - table_test[names(table_test) %in% missings_possibs] = 0 - table_matrix = matrix(table_test, byrow = T, nrow = 2) + sourcevalues <- test1[spat_mat == 1] + targetvalues <- test2[spat_mat == 1] - } else { - table_matrix = matrix(table(test), byrow = T, nrow = 2) - } + # option 1 + test <- paste0(sourcevalues, "-", targetvalues) - if(calc_hub == TRUE) { - high_cells = names(feat_vector[feat_vector == 1]) - subset_spat_mat = spat_mat[rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% high_cells] + if (length(unique(test)) < 4) { + possibs <- c("1-1", "0-1", "1-0", "0-0") + missings_possibs <- possibs[!possibs %in% unique(test)] + test <- c(test, missings_possibs) - if(length(subset_spat_mat) == 1) { - hub_nr = 0 + table_test <- table(test) + table_test[names(table_test) %in% missings_possibs] <- 0 + table_matrix <- matrix(table_test, byrow = TRUE, nrow = 2) } else { - rowhubs = rowSums_flex(subset_spat_mat) - colhubs = colSums_flex(subset_spat_mat) - hub_nr = length(unique(c(names(colhubs[colhubs > hub_min_int]), names(rowhubs[colhubs > hub_min_int])))) + table_matrix <- matrix(table(test), byrow = TRUE, nrow = 2) } - fish_matrix = table_matrix - fish_matrix = fish_matrix/1000 - OR = ((fish_matrix[1]*fish_matrix[4]) / (fish_matrix[2]*fish_matrix[3])) - return(c(feats = list(feat), OR, hubs = list(hub_nr))) + if (calc_hub == TRUE) { + high_cells <- names(feat_vector[feat_vector == 1]) + subset_spat_mat <- spat_mat[ + rownames(spat_mat) %in% high_cells, colnames(spat_mat) %in% + high_cells] + + if (length(subset_spat_mat) == 1) { + hub_nr <- 0 + } else { + rowhubs <- rowSums_flex(subset_spat_mat) + colhubs <- colSums_flex(subset_spat_mat) + hub_nr <- length(unique(c(names( + colhubs[colhubs > hub_min_int]), names( + rowhubs[colhubs > hub_min_int])))) + } - } + fish_matrix <- table_matrix + fish_matrix <- fish_matrix / 1000 + OR <- ((fish_matrix[1] * fish_matrix[4]) / + (fish_matrix[2] * fish_matrix[3])) - fish_matrix = table_matrix - fish_matrix = fish_matrix/1000 - OR = ((fish_matrix[1]*fish_matrix[4]) / (fish_matrix[2]*fish_matrix[3])) - return(c(feats = list(feat), OR)) + return(c(feats = list(feat), OR, hubs = list(hub_nr))) + } + fish_matrix <- table_matrix + fish_matrix <- fish_matrix / 1000 + OR <- ((fish_matrix[1] * fish_matrix[4]) / ( + fish_matrix[2] * fish_matrix[3])) + return(c(feats = list(feat), OR)) } #' @describeIn spat_odds_ratio data.table implementation #' @keywords internal -.spat_or_func_dt = function(bin_matrix_DTm, - spat_netw_min, - calc_hub = F, - hub_min_int = 3, - cores = NA) { - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - # data.table variables - from_value = to_value = feat_ID = N = to = from = cell_ID = V1 = NULL - - # get binarized expression values for the neighbors - spatial_network_min_ext = data.table::merge.data.table(spat_netw_min, bin_matrix_DTm, by.x = 'from', by.y = 'variable', allow.cartesian = T) - data.table::setnames(spatial_network_min_ext, 'value', 'from_value') - - spatial_network_min_ext = data.table::merge.data.table(spatial_network_min_ext, by.x = c('to', 'feat_ID'), bin_matrix_DTm, by.y = c('variable', 'feat_ID')) - data.table::setnames(spatial_network_min_ext, 'value', 'to_value') - - - # summarize the different combinations - spatial_network_min_ext[, combn := paste0(from_value,'-',to_value)] - freq_summary = spatial_network_min_ext[, .N, by = .(feat_ID, combn)] - data.table::setorder(freq_summary, feat_ID, combn) - - feats = unique(freq_summary$feat_ID) - all_combn = c('0-0', '0-1', '1-0', '1-1') - - # create a zeroes DT to add missing observations - freq_summary_zeroes = data.table::data.table(feat_ID = rep(feats, each = 4), - combn = rep(all_combn, length(feats)), - N = 0) - freq_summary2 = rbind(freq_summary, freq_summary_zeroes) - freq_summary2[, N := sum(N), by = .(feat_ID, combn)] - freq_summary2 = unique(freq_summary2) - - # sort the combinations and run fisher test - setorder(freq_summary2, feat_ID, combn, -N) - or_results = freq_summary2[, .or_test_func(matrix(N, nrow = 2)), by = feat_ID] - +.spat_or_func_dt <- function(bin_matrix_DTm, + spat_netw_min, + calc_hub = FALSE, + hub_min_int = 3, + cores = NA) { + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) - ## hubs ## - if(calc_hub == TRUE) { + # data.table variables + from_value <- to_value <- feat_ID <- N <- to <- from <- cell_ID <- + V1 <- NULL + + # get binarized expression values for the neighbors + spatial_network_min_ext <- data.table::merge.data.table( + spat_netw_min, bin_matrix_DTm, + by.x = "from", by.y = "variable", + allow.cartesian = TRUE) + data.table::setnames(spatial_network_min_ext, "value", "from_value") + + spatial_network_min_ext <- data.table::merge.data.table( + spatial_network_min_ext, by.x = c("to", "feat_ID"), + bin_matrix_DTm, by.y = c("variable", "feat_ID")) + data.table::setnames(spatial_network_min_ext, "value", "to_value") + + + # summarize the different combinations + spatial_network_min_ext[, combn := paste0(from_value, "-", to_value)] + freq_summary <- spatial_network_min_ext[, .N, by = .(feat_ID, combn)] + data.table::setorder(freq_summary, feat_ID, combn) + + feats <- unique(freq_summary$feat_ID) + all_combn <- c("0-0", "0-1", "1-0", "1-1") + + # create a zeroes DT to add missing observations + freq_summary_zeroes <- data.table::data.table( + feat_ID = rep(feats, each = 4), + combn = rep(all_combn, length(feats)), + N = 0 + ) + freq_summary2 <- rbind(freq_summary, freq_summary_zeroes) + freq_summary2[, N := sum(N), by = .(feat_ID, combn)] + freq_summary2 <- unique(freq_summary2) - double_pos = spatial_network_min_ext[combn == '1-1'] + # sort the combinations and run fisher test + setorder(freq_summary2, feat_ID, combn, -N) + or_results <- freq_summary2[ + , .or_test_func(matrix(N, nrow = 2)), by = feat_ID] - double_pos_to = double_pos[, .N, by = .(feat_ID, to)] - data.table::setnames(double_pos_to, 'to', 'cell_ID') - double_pos_from = double_pos[, .N, by = .(feat_ID, from)] - data.table::setnames(double_pos_from, 'from', 'cell_ID') - double_pos_both = rbind(double_pos_to, double_pos_from) - double_pos_both = double_pos_both[, sum(N), by = .(feat_ID, cell_ID)] - data.table::setorder(double_pos_both, feat_ID, -V1) + ## hubs ## + if (calc_hub == TRUE) { + double_pos <- spatial_network_min_ext[combn == "1-1"] - # get hubs and add 0's - hub_DT = double_pos_both[V1 > hub_min_int, .N, by = feat_ID] - hub_DT_zeroes = data.table::data.table(feat_ID = unique(spatial_network_min_ext$feat_ID), N = 0) - hub_DT2 = rbind(hub_DT, hub_DT_zeroes) + double_pos_to <- double_pos[, .N, by = .(feat_ID, to)] + data.table::setnames(double_pos_to, "to", "cell_ID") + double_pos_from <- double_pos[, .N, by = .(feat_ID, from)] + data.table::setnames(double_pos_from, "from", "cell_ID") - hub_DT2 = hub_DT2[, sum(N), by = feat_ID] - data.table::setnames(hub_DT2, 'V1', 'hub_nr') + double_pos_both <- rbind(double_pos_to, double_pos_from) + double_pos_both <- double_pos_both[, sum(N), by = .(feat_ID, cell_ID)] + data.table::setorder(double_pos_both, feat_ID, -V1) - or_results = data.table::merge.data.table(or_results, hub_DT2, by = 'feat_ID') + # get hubs and add 0's + hub_DT <- double_pos_both[V1 > hub_min_int, .N, by = feat_ID] + hub_DT_zeroes <- data.table::data.table( + feat_ID = unique(spatial_network_min_ext$feat_ID), N = 0) + hub_DT2 <- rbind(hub_DT, hub_DT_zeroes) - } + hub_DT2 <- hub_DT2[, sum(N), by = feat_ID] + data.table::setnames(hub_DT2, "V1", "hub_nr") - return(or_results) + or_results <- data.table::merge.data.table( + or_results, hub_DT2, by = "feat_ID") + } + return(or_results) } @@ -301,9 +316,9 @@ NULL #' @name .or_test_func #' @description calculate odds-ratio from a 2x2 matrix #' @keywords internal -.or_test_func = function(matrix) { - OR = ((matrix[1]*matrix[4]) / (matrix[2]*matrix[3])) - list('estimate' = OR) +.or_test_func <- function(matrix) { + OR <- ((matrix[1] * matrix[4]) / (matrix[2] * matrix[3])) + list("estimate" = OR) } @@ -314,279 +329,273 @@ NULL NULL -#' @describeIn calculate_spatial_enrichment calculate using a 'simple' and efficient for loop +#' @describeIn calculate_spatial_enrichment calculate using a 'simple' and +#' efficient for loop #' @keywords internal -.calc_spatial_enrichment_minimum = function(spatial_network, - bin_matrix, - adjust_method = 'fdr', - do_fisher_test = TRUE) { - - # data.table variables - from = to = feats = variable = value = p.value = adj.p.value = score = estimate = NULL - - spatial_network_min = spatial_network[,.(from, to)] - - all_colindex = 1:ncol(bin_matrix) - names(all_colindex) = colnames(bin_matrix) - - # code for possible combinations - convert_code = c(1, 2, 3, 4) - names(convert_code) = c('0-0', '0-1', '1-0', '1-1') - - # preallocate final matrix for results - matrix_res = matrix(data = NA, nrow = nrow(bin_matrix), ncol = nrow(spatial_network_min)) - - ## 1. summarize results for each edge in the network - for(row_i in 1:nrow(spatial_network_min)) { - - from_id = spatial_network_min[row_i][['from']] - to_id = spatial_network_min[row_i][['to']] - - sumres = data.table::as.data.table(bin_matrix[, all_colindex[c(from_id, to_id)]]) - sumres[, combn := paste0(get(from_id),'-',get(to_id))] - - ## maybe a slightly faster alternative ## - #sumres[, sum := get(from_id)+get(to_id)] - #sumres[, combn := ifelse(sum == 0, 1, - # ifelse(sum == 2, 4, - # ifelse(get(from_id) == 1, 3, 2)))] - #code_res = sumres[['combn']] - - code_res = convert_code[sumres$combn] - matrix_res[, row_i] = code_res - } - - rownames(matrix_res) = rownames(bin_matrix) +.calc_spatial_enrichment_minimum <- function(spatial_network, + bin_matrix, + adjust_method = "fdr", + do_fisher_test = TRUE) { + # data.table variables + from <- to <- feats <- variable <- value <- p.value <- adj.p.value <- + score <- estimate <- NULL + spatial_network_min <- spatial_network[, .(from, to)] - # preallocate matrix for table results - table_res = matrix(data = NA, nrow(matrix_res), ncol = 4) + all_colindex <- 1:ncol(bin_matrix) + names(all_colindex) <- colnames(bin_matrix) - ## 2. calculate the frequencies of possible combinations ## - # '0-0' = 1, '0-1' = 2, '1-0' = 3 and '1-1' = 4 - for(row_i in 1:nrow(matrix_res)) { + # code for possible combinations + convert_code <- c(1, 2, 3, 4) + names(convert_code) <- c("0-0", "0-1", "1-0", "1-1") - x = matrix_res[row_i,] - x = factor(x, levels = c(1,2,3,4)) - tabres = as.vector(table(x)) + # preallocate final matrix for results + matrix_res <- matrix( + data = NA, nrow = nrow(bin_matrix), ncol = nrow(spatial_network_min)) - table_res[row_i,] = tabres - } + ## 1. summarize results for each edge in the network + for (row_i in 1:nrow(spatial_network_min)) { + from_id <- spatial_network_min[row_i][["from"]] + to_id <- spatial_network_min[row_i][["to"]] - rownames(table_res) = rownames(matrix_res) - colnames(table_res) = 1:4 + sumres <- data.table::as.data.table(bin_matrix[ + , all_colindex[c(from_id, to_id)]]) + sumres[, combn := paste0(get(from_id), "-", get(to_id))] - rable_resDT = data.table::as.data.table(table_res) - rable_resDT[, feats := rownames(table_res)] + code_res <- convert_code[sumres$combn] + matrix_res[, row_i] <- code_res + } - rable_resDTm = data.table::melt.data.table(rable_resDT, id.vars = 'feats') - data.table::setorder(rable_resDTm, feats, variable) + rownames(matrix_res) <- rownames(bin_matrix) - ## run fisher test ## - if(do_fisher_test == TRUE) { - results = rable_resDTm[, stats::fisher.test(matrix(value, nrow = 2))[c(1,3)], by = feats] - # replace zero p-values with lowest p-value - min_pvalue = min(results$p.value[results$p.value > 0]) - results[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] - results[, adj.p.value := stats::p.adjust(p.value, method = adjust_method)] + # preallocate matrix for table results + table_res <- matrix(data = NA, nrow(matrix_res), ncol = 4) - # sort feats based on p-value and estimate - results[, score := -log(p.value) * estimate] - data.table::setorder(results, -score) + ## 2. calculate the frequencies of possible combinations ## + # '0-0' = 1, '0-1' = 2, '1-0' = 3 and '1-1' = 4 + for (row_i in 1:nrow(matrix_res)) { + x <- matrix_res[row_i, ] + x <- factor(x, levels = c(1, 2, 3, 4)) + tabres <- as.vector(table(x)) + table_res[row_i, ] <- tabres + } - } else { + rownames(table_res) <- rownames(matrix_res) + colnames(table_res) <- 1:4 - results = rable_resDTm[, .or_test_func(matrix(value, nrow = 2)), by = feats] - data.table::setorder(results, -estimate) + rable_resDT <- data.table::as.data.table(table_res) + rable_resDT[, feats := rownames(table_res)] - } + rable_resDTm <- data.table::melt.data.table(rable_resDT, id.vars = "feats") + data.table::setorder(rable_resDTm, feats, variable) - return(results) + ## run fisher test ## + if (do_fisher_test == TRUE) { + results <- rable_resDTm[, stats::fisher.test(matrix( + value, nrow = 2))[c(1, 3)], by = feats] + + # replace zero p-values with lowest p-value + min_pvalue <- min(results$p.value[results$p.value > 0]) + results[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] + results[, adj.p.value := stats::p.adjust( + p.value, method = adjust_method)] + + # sort feats based on p-value and estimate + results[, score := -log(p.value) * estimate] + data.table::setorder(results, -score) + } else { + results <- rable_resDTm[, .or_test_func(matrix( + value, nrow = 2)), by = feats] + data.table::setorder(results, -estimate) + } + return(results) } -#' @describeIn calculate_spatial_enrichment calculate using 'matrix' implementation +#' @describeIn calculate_spatial_enrichment calculate using 'matrix' +#' implementation #' @keywords internal -.calc_spatial_enrichment_matrix = function(spatial_network, - bin_matrix, - adjust_method = 'fdr', - do_fisher_test = TRUE, - do_parallel = TRUE, - cores = NA, - calc_hub = FALSE, - hub_min_int = 3, - verbose = TRUE) { - - - # data.table variables - verbose = feats = p.value = estimate = adj.p.value = score = NULL - - # convert spatial network data.table to spatial matrix - dc_spat_network = data.table::dcast.data.table(spatial_network, formula = to~from, value.var = 'distance', fill = 0) - spat_mat = dt_to_matrix(dc_spat_network) - spat_mat[spat_mat > 0] = 1 - - - ## parallel - if(do_parallel == TRUE) { - - if(do_fisher_test == TRUE) { - - save_list = suppressMessages(lapply_flex(X = rownames(bin_matrix), cores = cores, fun = .spat_fish_func, - bin_matrix = bin_matrix, spat_mat = spat_mat, - calc_hub = calc_hub, hub_min_int = hub_min_int)) - +.calc_spatial_enrichment_matrix <- function(spatial_network, + bin_matrix, + adjust_method = "fdr", + do_fisher_test = TRUE, + do_parallel = TRUE, + cores = NA, + calc_hub = FALSE, + hub_min_int = 3, + verbose = TRUE) { + # data.table variables + verbose <- feats <- p.value <- estimate <- adj.p.value <- score <- NULL + + # convert spatial network data.table to spatial matrix + dc_spat_network <- data.table::dcast.data.table( + spatial_network, formula = to ~ from, value.var = "distance", fill = 0) + spat_mat <- dt_to_matrix(dc_spat_network) + spat_mat[spat_mat > 0] <- 1 + + + ## parallel + if (do_parallel == TRUE) { + if (do_fisher_test == TRUE) { + save_list <- suppressMessages(lapply_flex( + X = rownames(bin_matrix), cores = cores, fun = .spat_fish_func, + bin_matrix = bin_matrix, spat_mat = spat_mat, + calc_hub = calc_hub, hub_min_int = hub_min_int + )) + } else { + save_list <- suppressMessages(lapply_flex( + X = rownames(bin_matrix), cores = cores, fun = .spat_or_func, + bin_matrix = bin_matrix, spat_mat = spat_mat, + calc_hub = calc_hub, hub_min_int = hub_min_int + )) + } } else { - save_list = suppressMessages(lapply_flex(X = rownames(bin_matrix), cores = cores, fun = .spat_or_func, - bin_matrix = bin_matrix, spat_mat = spat_mat, - calc_hub = calc_hub, hub_min_int = hub_min_int)) - + ## serial + save_list <- list() + + if (do_fisher_test == TRUE) { + for (feat in rownames(bin_matrix)) { + if (verbose == TRUE) print(feat) + + save_list[[feat]] <- suppressMessages(.spat_fish_func( + feat = feat, bin_matrix = bin_matrix, spat_mat = spat_mat, + calc_hub = calc_hub, hub_min_int = hub_min_int + )) + } + } else { + for (feat in rownames(bin_matrix)) { + if (verbose == TRUE) print(feat) + + save_list[[feat]] <- suppressMessages(.spat_or_func( + feat = feat, bin_matrix = bin_matrix, spat_mat = spat_mat, + calc_hub = calc_hub, hub_min_int = hub_min_int + )) + } + } } - } else { + result <- data.table::as.data.table(do.call("rbind", save_list)) + result[, feats := unlist(feats)] - ## serial - save_list = list() - if(do_fisher_test == TRUE) { - for(feat in rownames(bin_matrix)) { - if(verbose == TRUE) print(feat) + if (do_fisher_test == TRUE) { + result[, c("p.value", "estimate") := list( + as.numeric(p.value), as.numeric(estimate))] - save_list[[feat]] = suppressMessages(.spat_fish_func(feat = feat, bin_matrix = bin_matrix, spat_mat = spat_mat, - calc_hub = calc_hub, hub_min_int = hub_min_int)) + # convert p.value = 0 to lowest p-value + min_pvalue <- min(result$p.value[result$p.value > 0]) + result[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] + result[, adj.p.value := stats::p.adjust( + p.value, method = adjust_method)] - } + result[, score := -log(p.value) * estimate] + data.table::setorder(result, -score) } else { - for(feat in rownames(bin_matrix)) { - if(verbose == TRUE) print(feat) - - save_list[[feat]] = suppressMessages(.spat_or_func(feat = feat, bin_matrix = bin_matrix, spat_mat = spat_mat, - calc_hub = calc_hub, hub_min_int = hub_min_int)) - - } + data.table::setnames(result, "V1", "estimate") + data.table::setorder(result, -estimate) } - } - - result = data.table::as.data.table(do.call('rbind', save_list)) - result[, feats := unlist(feats)] - - - if(do_fisher_test == TRUE) { - result[, c('p.value', 'estimate') := list(as.numeric(p.value), as.numeric(estimate))] - - # convert p.value = 0 to lowest p-value - min_pvalue = min(result$p.value[result$p.value > 0]) - result[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] - result[, adj.p.value := stats::p.adjust(p.value, method = adjust_method)] - - result[, score := -log(p.value) * estimate] - data.table::setorder(result, -score) - - } else { - - data.table::setnames(result, 'V1', 'estimate') - data.table::setorder(result, -estimate) - } - - return(result) - + return(result) } -#' @describeIn calculate_spatial_enrichment calculate using 'data.table' implementation +#' @describeIn calculate_spatial_enrichment calculate using 'data.table' +#' implementation #' @keywords internal .calc_spatial_enrichment_dt <- function( - bin_matrix, - spatial_network, - calc_hub = FALSE, - hub_min_int = 3, - group_size = 'automatic', - do_fisher_test = TRUE, - adjust_method = 'fdr', - cores = NA -) { - - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) + bin_matrix, + spatial_network, + calc_hub = FALSE, + hub_min_int = 3, + group_size = "automatic", + do_fisher_test = TRUE, + adjust_method = "fdr", + cores = NA) { + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) - # data.table variables - from = to = feat_ID = p.value = adj.p.value = score = estimate = NULL - - # create minimum spatial network - spat_netw_min = spatial_network[,.(from, to)] - - # divide matrix in groups - if (!is.na(group_size) & is.numeric(group_size)) { - group_size = group_size - if(group_size > nrow(bin_matrix)) { - stop('group_size is too big, it can not be greater than the number of feats') + # data.table variables + from <- to <- feat_ID <- p.value <- adj.p.value <- score <- + estimate <- NULL + + # create minimum spatial network + spat_netw_min <- spatial_network[, .(from, to)] + + # divide matrix in groups + if (!is.na(group_size) & is.numeric(group_size)) { + group_size <- group_size + if (group_size > nrow(bin_matrix)) { + stop("group_size is too big, it can not be greater than the + number of feats") + } + } else if (group_size == "automatic") { + test_number <- ceiling(nrow(bin_matrix) / 10) + test_number <- max(2, test_number) + group_size <- min(200, test_number) } - } else if (group_size == 'automatic') { - - test_number = ceiling(nrow(bin_matrix)/10) - test_number = max(2, test_number) - group_size = min(200, test_number) - } - - groups = ceiling(nrow(bin_matrix)/group_size) - cut_groups = cut(1:nrow(bin_matrix), breaks = groups, labels = 1:groups) - if (any(table(cut_groups) == 1)) { - stop('With group size = ', group_size, ' you have a single gene in a group. Manually pick another group size') - } - indexes = 1:nrow(bin_matrix) - names(indexes) = cut_groups + groups <- ceiling(nrow(bin_matrix) / group_size) + cut_groups <- cut(1:nrow(bin_matrix), breaks = groups, labels = 1:groups) + if (any(table(cut_groups) == 1)) { + stop("With group size = ", group_size, + " you have a single gene in a group. Manually pick another group + size") + } + indexes <- 1:nrow(bin_matrix) + names(indexes) <- cut_groups + + + total_list <- list() + for (group in unique(cut_groups)) { + sel_indices <- indexes[names(indexes) == group] + + bin_matrix_DT <- data.table::as.data.table(bin_matrix[sel_indices, ]) + bin_matrix_DT[, feat_ID := rownames(bin_matrix[sel_indices, ])] + bin_matrix_DTm <- data.table::melt.data.table( + bin_matrix_DT, id.vars = "feat_ID") + + if (do_fisher_test == TRUE) { + test <- .spat_fish_func_dt( + bin_matrix_DTm = bin_matrix_DTm, + spat_netw_min = spat_netw_min, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + cores = cores + ) + } else { + test <- .spat_or_func_dt( + bin_matrix_DTm = bin_matrix_DTm, + spat_netw_min = spat_netw_min, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + cores = cores + ) + } - total_list = list() - for (group in unique(cut_groups)) { - sel_indices = indexes[names(indexes) == group] + total_list[[group]] <- test + } - bin_matrix_DT = data.table::as.data.table(bin_matrix[sel_indices,]) - bin_matrix_DT[, feat_ID := rownames(bin_matrix[sel_indices,])] - bin_matrix_DTm = data.table::melt.data.table(bin_matrix_DT, id.vars = 'feat_ID') + result <- do.call("rbind", total_list) if (do_fisher_test == TRUE) { - test = .spat_fish_func_dt(bin_matrix_DTm = bin_matrix_DTm, - spat_netw_min = spat_netw_min, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - cores = cores) + min_pvalue <- min(result$p.value[result$p.value > 0]) + result[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] + result[, adj.p.value := stats::p.adjust( + p.value, method = adjust_method)] + + result[, score := -log(p.value) * estimate] + data.table::setorder(result, -score) + data.table::setnames(result, "feat_ID", "feats") } else { - test = .spat_or_func_dt(bin_matrix_DTm = bin_matrix_DTm, - spat_netw_min = spat_netw_min, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - cores = cores) + data.table::setorder(result, -estimate) + data.table::setnames(result, "feat_ID", "feats") } - - total_list[[group]] = test - - } - - result = do.call('rbind', total_list) - - if (do_fisher_test == TRUE) { - min_pvalue = min(result$p.value[result$p.value > 0]) - result[, p.value := ifelse(p.value == 0, min_pvalue, p.value)] - result[, adj.p.value := stats::p.adjust(p.value, method = adjust_method)] - - result[, score := -log(p.value) * estimate] - data.table::setorder(result, -score) - data.table::setnames(result, 'feat_ID', 'feats') - } else { - data.table::setorder(result, -estimate) - data.table::setnames(result, 'feat_ID', 'feats') - } - - return(result) - + return(result) } @@ -594,26 +603,32 @@ NULL #' @title binSpectSingleMatrix #' @name binSpectSingleMatrix -#' @description binSpect for a single spatial network and a provided expression matrix +#' @description binSpect for a single spatial network and a provided +#' expression matrix #' @param expression_matrix expression matrix #' @param spatial_network spatial network in data.table format -#' @param bin_matrix a binarized matrix, when provided it will skip the binarization process +#' @param bin_matrix a binarized matrix, when provided it will skip the +#' binarization process #' @param bin_method method to binarize gene expression #' @param subset_feats only select a subset of features to test -#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset) +#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, +#' kmeans_arma_subset) #' @param nstart kmeans: nstart parameter #' @param iter_max kmeans: iter.max parameter #' @param extreme_nr number of top and bottom cells (see details) #' @param sample_nr total number of cells to sample (see details) #' @param percentage_rank percentage of top cells for binarization #' @param do_fisher_test perform fisher test -#' @param adjust_method p-value adjusted method to use (see \code{\link[stats]{p.adjust}}) +#' @param adjust_method p-value adjusted method to use +#' (see \code{\link[stats]{p.adjust}}) #' @param calc_hub calculate the number of hub cells #' @param hub_min_int minimum number of cell-cell interactions for a hub cell -#' @param get_av_expr calculate the average expression per gene of the high expressing cells +#' @param get_av_expr calculate the average expression per gene of the high +#' expressing cells #' @param get_high_expr calculate the number of high expressing cells per gene #' @param implementation enrichment implementation (data.table, simple, matrix) -#' @param group_size number of genes to process together with data.table implementation (default = automatic) +#' @param group_size number of genes to process together with data.table +#' implementation (default = automatic) #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param verbose be verbose @@ -621,7 +636,8 @@ NULL #' @param seed Seed for kmeans binarization. If NULL passed, no seed is set. #' Otherwise, the input value is used as seed. #' @return data.table with results (see details) -#' @details We provide two ways to identify spatial genes based on gene expression binarization. +#' @details We provide two ways to identify spatial genes based on gene +#' expression binarization. #' Both methods are identicial except for how binarization is performed. #' \itemize{ #' \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} @@ -642,200 +658,211 @@ NULL #' \item{Average expression of each gene within high expressing cells } #' \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} #' } -#' By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -#' The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -#' The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +#' By selecting a subset of likely spatial genes (e.g. soft thresholding +#' highly variable genes) can accelerate the speed. +#' The simple implementation is usually faster, but lacks the possibility to +#' run in parallel and to calculate hub cells. +#' The data.table implementation might be more appropriate for large datasets +#' by setting the group_size (number of genes) parameter to divide the workload. #' @export -binSpectSingleMatrix = function(expression_matrix, - spatial_network = NULL, - bin_matrix = NULL, - bin_method = c('kmeans', 'rank'), - subset_feats = NULL, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - percentage_rank = 30, - do_fisher_test = TRUE, - adjust_method = 'fdr', - calc_hub = FALSE, - hub_min_int = 3, - get_av_expr = TRUE, - get_high_expr = TRUE, - implementation = c('data.table', 'simple', 'matrix'), - group_size = 'automatic', - do_parallel = TRUE, - cores = NA, - verbose = FALSE, - set.seed = deprecated(), - seed = 1234) { - - if (is_present(set.seed) && !is.function(set.seed)) { - deprecate_warn( - when = "4.0.3", - what = "binSpectSingleMatrix(set.seed)", - with = "binSpectSingleMatrix(seed)" - ) - - seed <- set.seed - set.seed <- NULL - } +binSpectSingleMatrix <- function(expression_matrix, + spatial_network = NULL, + bin_matrix = NULL, + bin_method = c("kmeans", "rank"), + subset_feats = NULL, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + percentage_rank = 30, + do_fisher_test = TRUE, + adjust_method = "fdr", + calc_hub = FALSE, + hub_min_int = 3, + get_av_expr = TRUE, + get_high_expr = TRUE, + implementation = c("data.table", "simple", "matrix"), + group_size = "automatic", + do_parallel = TRUE, + cores = NA, + verbose = FALSE, + set.seed = deprecated(), + seed = 1234) { + if (is_present(set.seed) && !is.function(set.seed)) { + deprecate_warn( + when = "4.0.3", + what = "binSpectSingleMatrix(set.seed)", + with = "binSpectSingleMatrix(seed)" + ) + + seed <- set.seed + set.seed <- NULL + } - do_parallel <- as.logical(do_parallel) - calc_hub <- as.logical(calc_hub) - get_av_expr <- as.logical(get_av_expr) - get_high_expr <- as.logical(get_high_expr) - do_fisher_test <- as.logical(do_fisher_test) + do_parallel <- as.logical(do_parallel) + calc_hub <- as.logical(calc_hub) + get_av_expr <- as.logical(get_av_expr) + get_high_expr <- as.logical(get_high_expr) + do_fisher_test <- as.logical(do_fisher_test) - vmsg(.v = verbose, '\n This is the single parameter version of binSpect') + vmsg(.v = verbose, "\n This is the single parameter version of binSpect") - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) - # data.table: set global variable - feats = p.value = estimate = score = NULL + # data.table: set global variable + feats <- p.value <- estimate <- score <- NULL - # set binarization method - bin_method = match.arg(bin_method, choices = c('kmeans', 'rank')) + # set binarization method + bin_method <- match.arg(bin_method, choices = c("kmeans", "rank")) - # kmeans algorithm - kmeans_algo = match.arg(kmeans_algo, choices = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset')) + # kmeans algorithm + kmeans_algo <- match.arg( + kmeans_algo, + choices = c("kmeans", "kmeans_arma", "kmeans_arma_subset")) - # implementation - implementation = match.arg(implementation, choices = c('data.table', 'simple', 'matrix')) + # implementation + implementation <- match.arg( + implementation, choices = c("data.table", "simple", "matrix")) - # spatial network - # TODO: verify binarization of spatial network - if (is.null(spatial_network)) { - stop("You need to provide a spatial network in data.table format to the 'spatial_network' parameter \n") - } + # spatial network + # TODO: verify binarization of spatial network + if (is.null(spatial_network)) { + stop("You need to provide a spatial network in data.table format to + the 'spatial_network' parameter") + } - ## start binarization ## - ## ------------------ ## + ## start binarization ## + ## ------------------ ## - if (!is.null(bin_matrix)) { - # TODO: verify format of bin_matrix and compatibility with spatial network - bin_matrix = bin_matrix - } else { + if (!is.null(bin_matrix)) { + # TODO: verify format of bin_matrix and compatibility with spatial + # network + bin_matrix <- bin_matrix + } else { + bin_matrix <- switch(bin_method, + "kmeans" = kmeans_binarize_wrapper( + expr_values = expression_matrix, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + # set.seed = set.seed, + seed = seed + ), + "rank" = rank_binarize_wrapper( + expr_values = expression_matrix, + subset_feats = subset_feats, + percentage_rank = percentage_rank + ) + ) + } - bin_matrix <- switch(bin_method, - "kmeans" = kmeans_binarize_wrapper( - expr_values = expression_matrix, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - # set.seed = set.seed, - seed = seed - ), - "rank" = rank_binarize_wrapper( - expr_values = expression_matrix, - subset_feats = subset_feats, - percentage_rank = percentage_rank - ) + vmsg(.v = verbose, "\n 1. matrix binarization complete") + + ## start with enrichment ## + ## --------------------- ## + + result <- switch(implementation, + "simple" = { + if (do_parallel) { + warning("Parallel not yet implemented for simple. + Enrichment will default to serial.") + } + if (calc_hub) { + warning("Hub calculation is not possible with the simple + implementation, change to matrix if required.") + } + + .calc_spatial_enrichment_minimum( + spatial_network = spatial_network, + bin_matrix = bin_matrix, + adjust_method = adjust_method, + do_fisher_test = do_fisher_test + ) + }, + "matrix" = .calc_spatial_enrichment_matrix( + spatial_network = spatial_network, + bin_matrix = bin_matrix, + adjust_method = adjust_method, + do_fisher_test = do_fisher_test, + do_parallel = do_parallel, + cores = cores, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + verbose = verbose + ), + "data.table" = .calc_spatial_enrichment_dt( + bin_matrix = bin_matrix, + spatial_network = spatial_network, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + group_size = group_size, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + cores = cores + ) ) - } - - vmsg(.v = verbose, '\n 1. matrix binarization complete \n') - - ## start with enrichment ## - ## --------------------- ## - result <- switch(implementation, - "simple" = { - if(do_parallel) { - warning('Parallel not yet implemented for simple. Enrichment will default to serial.') - } - if(calc_hub) { - warning('Hub calculation is not possible with the simple implementation, change to matrix if required.') - } + vmsg(.v = verbose, "\n 2. spatial enrichment test completed") - .calc_spatial_enrichment_minimum( - spatial_network = spatial_network, - bin_matrix = bin_matrix, - adjust_method = adjust_method, - do_fisher_test = do_fisher_test - ) - }, - "matrix" = .calc_spatial_enrichment_matrix( - spatial_network = spatial_network, - bin_matrix = bin_matrix, - adjust_method = adjust_method, - do_fisher_test = do_fisher_test, - do_parallel = do_parallel, - cores = cores, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - verbose = verbose - ), - "data.table" = .calc_spatial_enrichment_dt( - bin_matrix = bin_matrix, - spatial_network = spatial_network, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - group_size = group_size, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - cores = cores - ) - ) - vmsg(.v = verbose, '\n 2. spatial enrichment test completed \n') + ## start with average high expression ## + ## ---------------------------------- ## + if (get_av_expr) { + # expression + if (!is.null(subset_feats)) { + expr_values <- expression_matrix[ + rownames(expression_matrix) %in% subset_feats, ] + } else { + expr_values <- expression_matrix + } - ## start with average high expression ## - ## ---------------------------------- ## - - if (get_av_expr) { + sel_expr_values <- expr_values * bin_matrix + av_expr <- apply(sel_expr_values, MARGIN = 1, FUN = function(x) { + mean(x[x > 0]) + }) + av_expr_DT <- data.table::data.table( + feats = names(av_expr), av_expr = av_expr) + result <- merge(result, av_expr_DT, by = "feats") - # expression - if (!is.null(subset_feats)) { - expr_values = expression_matrix[rownames(expression_matrix) %in% subset_feats, ] - } else { - expr_values = expression_matrix + vmsg(.v = verbose, "\n 3. (optional) average expression of high + expressing cells calculated") } - sel_expr_values = expr_values * bin_matrix - av_expr = apply(sel_expr_values, MARGIN = 1, FUN = function(x) { - mean(x[x > 0]) - }) - av_expr_DT = data.table::data.table(feats = names(av_expr), av_expr = av_expr) - result = merge(result, av_expr_DT, by = 'feats') - - vmsg(.v = verbose, '\n 3. (optional) average expression of high expressing cells calculated \n') - } - - - ## start with number of high expressing cells ## - ## ------------------------------------------ ## - if (get_high_expr) { - high_expr = rowSums(bin_matrix) - high_expr_DT = data.table::data.table(feats = names(high_expr), high_expr = high_expr) - result = merge(result, high_expr_DT, by = 'feats') + ## start with number of high expressing cells ## + ## ------------------------------------------ ## - vmsg(.v = verbose, '\n 4. (optional) number of high expressing cells calculated \n') - } + if (get_high_expr) { + high_expr <- rowSums(bin_matrix) + high_expr_DT <- data.table::data.table( + feats = names(high_expr), high_expr = high_expr) + result <- merge(result, high_expr_DT, by = "feats") + vmsg(.v = verbose, "\n 4. (optional) number of high expressing cells + calculated") + } - # sort - if (do_fisher_test) { - data.table::setorder(result, -score) - } else { - data.table::setorder(result, -estimate) - } - return(result) + # sort + if (do_fisher_test) { + data.table::setorder(result, -score) + } else { + data.table::setorder(result, -estimate) + } + return(result) } @@ -850,31 +877,38 @@ binSpectSingleMatrix = function(expression_matrix, #' @param expression_values expression values to use #' @param subset_feats only select a subset of features to test #' @param subset_genes deprecated, use subset_feats -#' @param spatial_network_name name of spatial network to use (default = 'spatial_network') +#' @param spatial_network_name name of spatial network to use +#' (default = 'spatial_network') #' @param reduce_network default uses the full network -#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset) +#' @param kmeans_algo kmeans algorithm to use +#' (kmeans, kmeans_arma, kmeans_arma_subset) #' @param nstart kmeans: nstart parameter #' @param iter_max kmeans: iter.max parameter #' @param extreme_nr number of top and bottom cells (see details) #' @param sample_nr total number of cells to sample (see details) #' @param percentage_rank percentage of top cells for binarization #' @param do_fisher_test perform fisher test -#' @param adjust_method p-value adjusted method to use (see \code{\link[stats]{p.adjust}}) +#' @param adjust_method p-value adjusted method to use +#' (see \code{\link[stats]{p.adjust}}) #' @param calc_hub calculate the number of hub cells #' @param hub_min_int minimum number of cell-cell interactions for a hub cell -#' @param get_av_expr calculate the average expression per gene of the high expressing cells +#' @param get_av_expr calculate the average expression per gene of the high +#' expressing cells #' @param get_high_expr calculate the number of high expressing cells per gene #' @param implementation enrichment implementation (data.table, simple, matrix) -#' @param group_size number of genes to process together with data.table implementation (default = automatic) +#' @param group_size number of genes to process together with data.table +#' implementation (default = automatic) #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param verbose be verbose #' @param set.seed deprecated. #' @param seed Seed for kmeans binarization. If NULL passed, no seed is set. #' Otherwise, the input value is used as seed. -#' @param bin_matrix a binarized matrix, when provided it will skip the binarization process +#' @param bin_matrix a binarized matrix, when provided it will skip the +#' binarization process #' @return data.table with results (see details) -#' @details We provide two ways to identify spatial genes based on gene expression binarization. +#' @details We provide two ways to identify spatial genes based on gene +#' expression binarization. #' Both methods are identicial except for how binarization is performed. #' \itemize{ #' \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} @@ -895,116 +929,129 @@ binSpectSingleMatrix = function(expression_matrix, #' \item{Average expression of each gene within high expressing cells } #' \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} #' } -#' By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -#' The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -#' The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +#' By selecting a subset of likely spatial genes +#' (e.g. soft thresholding highly variable genes) can accelerate the speed. +#' The simple implementation is usually faster, but lacks the possibility to +#' run in parallel and to calculate hub cells. +#' The data.table implementation might be more appropriate for large datasets +#' by setting the group_size (number of genes) parameter to divide the workload. #' @export -binSpectSingle = function(gobject, - spat_unit = NULL, - feat_type = NULL, - bin_method = c('kmeans', 'rank'), - expression_values = c('normalized', 'scaled', 'custom'), - subset_feats = NULL, - subset_genes = NULL, - spatial_network_name = 'Delaunay_network', - reduce_network = FALSE, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - percentage_rank = 30, - do_fisher_test = TRUE, - adjust_method = 'fdr', - calc_hub = FALSE, - hub_min_int = 3, - get_av_expr = TRUE, - get_high_expr = TRUE, - implementation = c('data.table', 'simple', 'matrix'), - group_size = 'automatic', - do_parallel = TRUE, - cores = NA, - verbose = TRUE, - set.seed = deprecated(), - seed = 1234, - bin_matrix = NULL) { - - - ## deprecated arguments - if(!is.null(subset_genes)) { - subset_feats = subset_genes - warning('subset_genes is deprecated, use subset_feats in the future \n') - } - - if (is_present(set.seed) && !is.function(set.seed)) { - deprecate_warn( - when = "4.0.3", - what = "binSpectSingle(set.seed)", - with = "binSpectSingle(seed)" +binSpectSingle <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + bin_method = c("kmeans", "rank"), + expression_values = c("normalized", "scaled", "custom"), + subset_feats = NULL, + subset_genes = NULL, + spatial_network_name = "Delaunay_network", + reduce_network = FALSE, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + percentage_rank = 30, + do_fisher_test = TRUE, + adjust_method = "fdr", + calc_hub = FALSE, + hub_min_int = 3, + get_av_expr = TRUE, + get_high_expr = TRUE, + implementation = c("data.table", "simple", "matrix"), + group_size = "automatic", + do_parallel = TRUE, + cores = NA, + verbose = TRUE, + set.seed = deprecated(), + seed = 1234, + bin_matrix = NULL) { + ## deprecated arguments + if (!is.null(subset_genes)) { + subset_feats <- subset_genes + warning("subset_genes is deprecated, use subset_feats in the future") + } + + if (is_present(set.seed) && !is.function(set.seed)) { + deprecate_warn( + when = "4.0.3", + what = "binSpectSingle(set.seed)", + with = "binSpectSingle(seed)" + ) + + seed <- set.seed + set.seed <- NULL + } + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - seed <- set.seed - set.seed <- NULL - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## 1. expression matrix - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - - ## 2. spatial network - spatial_network = get_spatialNetwork(gobject = gobject, - spat_unit = spat_unit, - name = spatial_network_name, - output = 'networkDT') - if(is.null(spatial_network)) { - stop('spatial_network_name: ', spatial_network_name, ' does not exist, create a spatial network first') - } - - # convert to full network - if(reduce_network == FALSE) { - spatial_network = convert_to_full_spatial_network(spatial_network) - data.table::setnames(spatial_network, c('source', 'target'), c('from', 'to')) - } - - - binSpectSingleMatrix(expression_matrix = expr_values, - spatial_network = spatial_network, - bin_matrix = bin_matrix, - bin_method = bin_method, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - percentage_rank = percentage_rank, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - # set.seed = set.seed, - seed = seed) + ## 1. expression matrix + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + + + ## 2. spatial network + spatial_network <- getSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "networkDT" + ) + if (is.null(spatial_network)) { + stop("spatial_network_name: ", spatial_network_name, + " does not exist, create a spatial network first") + } + + # convert to full network + if (reduce_network == FALSE) { + spatial_network <- convert_to_full_spatial_network(spatial_network) + data.table::setnames( + spatial_network, c("source", "target"), c("from", "to")) + } + binSpectSingleMatrix( + expression_matrix = expr_values, + spatial_network = spatial_network, + bin_matrix = bin_matrix, + bin_method = bin_method, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + percentage_rank = percentage_rank, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + # set.seed = set.seed, + seed = seed + ) } @@ -1023,20 +1070,24 @@ binSpectSingle = function(gobject, #' @param subset_genes deprecated, use subset_feats #' @param spatial_network_k different k's for a spatial kNN to evaluate #' @param reduce_network default uses the full network -#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset) +#' @param kmeans_algo kmeans algorithm to use +#' (kmeans, kmeans_arma, kmeans_arma_subset) #' @param nstart kmeans: nstart parameter #' @param iter_max kmeans: iter.max parameter #' @param extreme_nr number of top and bottom cells (see details) #' @param sample_nr total number of cells to sample (see details) #' @param percentage_rank percentage of top cells for binarization #' @param do_fisher_test perform fisher test -#' @param adjust_method p-value adjusted method to use (see \code{\link[stats]{p.adjust}}) +#' @param adjust_method p-value adjusted method to use +#' (see \code{\link[stats]{p.adjust}}) #' @param calc_hub calculate the number of hub cells #' @param hub_min_int minimum number of cell-cell interactions for a hub cell -#' @param get_av_expr calculate the average expression per gene of the high expressing cells +#' @param get_av_expr calculate the average expression per gene of the high +#' expressing cells #' @param get_high_expr calculate the number of high expressing cells per gene #' @param implementation enrichment implementation (data.table, simple, matrix) -#' @param group_size number of genes to process together with data.table implementation (default = automatic) +#' @param group_size number of genes to process together with data.table +#' implementation (default = automatic) #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param verbose be verbose @@ -1046,8 +1097,9 @@ binSpectSingle = function(gobject, #' Otherwise, the input value is used as seed. #' @param summarize summarize the p-values or adjusted p-values #' @return data.table with results (see details) -#' @details We provide two ways to identify spatial genes based on gene expression binarization. -#' Both methods are identicial except for how binarization is performed. +#' @details We provide two ways to identify spatial genes based on gene +#' expression binarization. +#' Both methods are identical except for how binarization is performed. #' \itemize{ #' \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} #' \item{2. network: }{Alll cells are connected through a spatial network based on the physical coordinates} @@ -1067,217 +1119,230 @@ binSpectSingle = function(gobject, #' \item{Average expression of each gene within high expressing cells } #' \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} #' } -#' By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -#' The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -#' The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +#' By selecting a subset of likely spatial genes +#' (e.g. soft thresholding highly variable genes) can accelerate the speed. +#' The simple implementation is usually faster, but lacks the possibility to +#' run in parallel and to calculate hub cells. +#' The data.table implementation might be more appropriate for large datasets +#' by setting the group_size (number of genes) parameter to divide the workload. #' @export -binSpectMulti = function(gobject, - feat_type = NULL, - spat_unit = NULL, - bin_method = c('kmeans', 'rank'), - expression_values = c('normalized', 'scaled', 'custom'), - subset_feats = NULL, - subset_genes = NULL, - spatial_network_k = c(5, 10, 20), - reduce_network = FALSE, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - percentage_rank = c(10, 30), - do_fisher_test = TRUE, - adjust_method = 'fdr', - calc_hub = FALSE, - hub_min_int = 3, - get_av_expr = TRUE, - get_high_expr = TRUE, - implementation = c('data.table', 'simple', 'matrix'), - group_size = 'automatic', - do_parallel = TRUE, - cores = NA, - verbose = TRUE, - knn_params = NULL, - set.seed = deprecated(), - seed = 1234, - summarize = c('adj.p.value', 'p.value') -) { - - - ## deprecated arguments - if(!is.null(subset_genes)) { - subset_feats = subset_genes - warning('subset_genes is deprecated, use subset_feats in the future \n') - } - - if (is_present(set.seed) && !is.function(set.seed)) { - deprecate_warn( - when = "4.0.3", - what = "binSpectMulti(set.seed)", - with = "binSpectMulti(seed)" - ) - - seed <- set.seed - set.seed <- NULL - } - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - if(verbose == TRUE) cat('\n This is the multi parameter version of binSpect') - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - # check bin_method - bin_method = match.arg(bin_method, choices = c('kmeans', 'rank')) - - # summarization level - summarize = match.arg(summarize, choices = c('adj.p.value', 'p.value')) - - ## bin method rank - if(bin_method == 'rank') { - - total_trials = length(spatial_network_k)*length(percentage_rank) - result_list = vector(mode = 'list', length = total_trials) - i = 1 - - for(k in spatial_network_k) { - - if(is.null(knn_params)) { - knn_params = list(minimum_k = 1) - } - temp_gobject = do.call('createSpatialKNNnetwork', c(gobject = gobject, - spat_unit = spat_unit, - name = 'temp_knn_network', - k = k, - knn_params)) - - for(rank_i in percentage_rank) { - - if(verbose == TRUE) cat('\n Run for k = ', k, ' and rank % = ', rank_i,'\n') - - result = binSpectSingle(gobject = temp_gobject, - feat_type = feat_type, - spat_unit = spat_unit, - bin_method = bin_method, - expression_values = expression_values, - subset_feats = subset_feats, - spatial_network_name = 'temp_knn_network', - reduce_network = reduce_network, - kmeans_algo = kmeans_algo, - percentage_rank = rank_i, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - # set.seed = set.seed, - seed = seed) - - result_list[[i]] = result - i = i+1 - } +binSpectMulti <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + bin_method = c("kmeans", "rank"), + expression_values = c("normalized", "scaled", "custom"), + subset_feats = NULL, + subset_genes = NULL, + spatial_network_k = c(5, 10, 20), + reduce_network = FALSE, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + percentage_rank = c(10, 30), + do_fisher_test = TRUE, + adjust_method = "fdr", + calc_hub = FALSE, + hub_min_int = 3, + get_av_expr = TRUE, + get_high_expr = TRUE, + implementation = c("data.table", "simple", "matrix"), + group_size = "automatic", + do_parallel = TRUE, + cores = NA, + verbose = TRUE, + knn_params = NULL, + set.seed = deprecated(), + seed = 1234, + summarize = c("adj.p.value", "p.value")) { + ## deprecated arguments + if (!is.null(subset_genes)) { + subset_feats <- subset_genes + warning("subset_genes is deprecated, use subset_feats in the future") } - combined_result = data.table::rbindlist(result_list) - - } else if(bin_method == 'kmeans') { - - ## bin method kmeans - total_trials = length(spatial_network_k) - result_list = vector(mode = 'list', length = total_trials) - i = 1 - - ## expression matrix - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values) - - - # pre-calculate bin_matrix once - bin_matrix = kmeans_binarize_wrapper(expr_values = expr_values, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - # set.seed = set.seed, - seed = seed) - - for(k in spatial_network_k) { - - if(is.null(knn_params)) { - knn_params = list(minimum_k = 1) - } - temp_gobject = do.call('createSpatialKNNnetwork', c(gobject = gobject, - spat_unit = spat_unit, - name = 'temp_knn_network', - k = k, knn_params)) - - if(verbose == TRUE) cat('\n Run for k = ', k,'\n') - - result = binSpectSingle(gobject = temp_gobject, - feat_type = feat_type, - spat_unit = spat_unit, - bin_method = bin_method, - expression_values = expression_values, - subset_feats = subset_feats, - spatial_network_name = 'temp_knn_network', - reduce_network = reduce_network, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - # set.seed = set.seed, - seed = seed, - bin_matrix = bin_matrix) - - result_list[[i]] = result - i = i+1 + if (is_present(set.seed) && !is.function(set.seed)) { + deprecate_warn( + when = "4.0.3", + what = "binSpectMulti(set.seed)", + with = "binSpectMulti(seed)" + ) + + seed <- set.seed + set.seed <- NULL } - combined_result = data.table::rbindlist(result_list) + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } + if (verbose == TRUE) + message("This is the multi parameter version of binSpect") + + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) + + # check bin_method + bin_method <- match.arg(bin_method, choices = c("kmeans", "rank")) + + # summarization level + summarize <- match.arg(summarize, choices = c("adj.p.value", "p.value")) + + ## bin method rank + if (bin_method == "rank") { + total_trials <- length(spatial_network_k) * length(percentage_rank) + result_list <- vector(mode = "list", length = total_trials) + i <- 1 + + for (k in spatial_network_k) { + if (is.null(knn_params)) { + knn_params <- list(minimum_k = 1) + } + temp_gobject <- do.call("createSpatialKNNnetwork", c( + gobject = gobject, + spat_unit = spat_unit, + name = "temp_knn_network", + k = k, + knn_params + )) + + for (rank_i in percentage_rank) { + if (verbose == TRUE) + cat("Run for k = ", k, " and rank % = ", rank_i) + + result <- binSpectSingle( + gobject = temp_gobject, + feat_type = feat_type, + spat_unit = spat_unit, + bin_method = bin_method, + expression_values = expression_values, + subset_feats = subset_feats, + spatial_network_name = "temp_knn_network", + reduce_network = reduce_network, + kmeans_algo = kmeans_algo, + percentage_rank = rank_i, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + # set.seed = set.seed, + seed = seed + ) + + result_list[[i]] <- result + i <- i + 1 + } + } + combined_result <- data.table::rbindlist(result_list) + } else if (bin_method == "kmeans") { + ## bin method kmeans + total_trials <- length(spatial_network_k) + result_list <- vector(mode = "list", length = total_trials) + i <- 1 + + ## expression matrix + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values + ) + + + # pre-calculate bin_matrix once + bin_matrix <- kmeans_binarize_wrapper( + expr_values = expr_values, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + # set.seed = set.seed, + seed = seed + ) + + for (k in spatial_network_k) { + if (is.null(knn_params)) { + knn_params <- list(minimum_k = 1) + } + temp_gobject <- do.call("createSpatialKNNnetwork", c( + gobject = gobject, + spat_unit = spat_unit, + name = "temp_knn_network", + k = k, knn_params + )) + + if (verbose == TRUE) cat("Run for k = ", k) + + result <- binSpectSingle( + gobject = temp_gobject, + feat_type = feat_type, + spat_unit = spat_unit, + bin_method = bin_method, + expression_values = expression_values, + subset_feats = subset_feats, + spatial_network_name = "temp_knn_network", + reduce_network = reduce_network, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + # set.seed = set.seed, + seed = seed, + bin_matrix = bin_matrix + ) + + result_list[[i]] <- result + i <- i + 1 + } + combined_result <- data.table::rbindlist(result_list) + } - # data.table variables - feats = V1 = p.val = NULL - ## merge results into 1 p-value per feat ## - simple_result = combined_result[, sum(log(get(summarize))), by = feats] - simple_result[, V1 := V1*-2] - simple_result[, p.val := stats::pchisq(q = V1, df = total_trials, log.p = F, lower.tail = F)] + # data.table variables + feats <- V1 <- p.val <- NULL - return(list(combined = combined_result, simple = simple_result[,.(feats, p.val)])) + ## merge results into 1 p-value per feat ## + simple_result <- combined_result[, sum(log(get(summarize))), by = feats] + simple_result[, V1 := V1 * -2] + simple_result[, p.val := stats::pchisq( + q = V1, df = total_trials, log.p = FALSE, lower.tail = FALSE)] + return(list( + combined = combined_result, simple = simple_result[, .(feats, p.val)])) } @@ -1286,25 +1351,30 @@ binSpectMulti = function(gobject, #' @title binSpectMultiMatrix #' @name binSpectMultiMatrix -#' @description binSpect for a single spatial network and a provided expression matrix +#' @description binSpect for a single spatial network and a provided +#' expression matrix #' @param expression_matrix expression matrix #' @param spatial_networks list of spatial networks in data.table format #' @param bin_method method to binarize gene expression #' @param subset_feats only select a subset of features to test -#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset) +#' @param kmeans_algo kmeans algorithm to use +#' (kmeans, kmeans_arma, kmeans_arma_subset) #' @param nstart kmeans: nstart parameter #' @param iter_max kmeans: iter.max parameter #' @param extreme_nr number of top and bottom cells (see details) #' @param sample_nr total number of cells to sample (see details) #' @param percentage_rank vector of percentages of top cells for binarization #' @param do_fisher_test perform fisher test -#' @param adjust_method p-value adjusted method to use (see \code{\link[stats]{p.adjust}}) +#' @param adjust_method p-value adjusted method to use +#' (see \code{\link[stats]{p.adjust}}) #' @param calc_hub calculate the number of hub cells #' @param hub_min_int minimum number of cell-cell interactions for a hub cell -#' @param get_av_expr calculate the average expression per gene of the high expressing cells +#' @param get_av_expr calculate the average expression per gene of the high +#' expressing cells #' @param get_high_expr calculate the number of high expressing cells per gene #' @param implementation enrichment implementation (data.table, simple, matrix) -#' @param group_size number of genes to process together with data.table implementation (default = automatic) +#' @param group_size number of genes to process together with data.table +#' implementation (default = automatic) #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param verbose be verbose @@ -1314,165 +1384,163 @@ binSpectMulti = function(gobject, #' is set. #' @param summarize summarize the p-values or adjusted p-values #' @return data.table with results -binSpectMultiMatrix = function(expression_matrix, - spatial_networks, - bin_method = c('kmeans', 'rank'), - subset_feats = NULL, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - percentage_rank = c(10, 30), - do_fisher_test = TRUE, - adjust_method = 'fdr', - calc_hub = FALSE, - hub_min_int = 3, - get_av_expr = TRUE, - get_high_expr = TRUE, - implementation = c('data.table', 'simple', 'matrix'), - group_size = 'automatic', - do_parallel = TRUE, - cores = NA, - verbose = T, - knn_params = NULL, - set.seed = deprecated(), - seed = 1234, - summarize = c('adj.p.value', 'p.value') -) { - - if (is_present(set.seed) && !is.function(set.seed)) { - deprecate_warn( - when = "4.0.3", - what = "binSpectMultiMatrix(set.seed)", - with = "binSpectMultiMatrix(seed)" - ) - - seed <- set.seed - set.seed <- NULL - } - - - if(verbose == TRUE) cat('\n This is the multi parameter version of binSpect') - - # set number of cores automatically, but with limit of 10 - cores = determine_cores(cores) - data.table::setDTthreads(threads = cores) - - # check bin_method - bin_method = match.arg(bin_method, choices = c('kmeans', 'rank')) - - # summarization level - summarize = match.arg(summarize, choices = c('adj.p.value', 'p.value')) - - - ## bin method rank - if(bin_method == 'rank') { - - total_trials = length(spatial_networks)*length(percentage_rank) - result_list = vector(mode = 'list', length = total_trials) - i = 1 - - for(k in seq_along(spatial_networks)) { - - for(rank_i in percentage_rank) { - - if(verbose == TRUE) cat('\n Run for spatial network ', k ,' and rank % = ', rank_i,'\n') - - result = binSpectSingleMatrix(expression_matrix = expression_matrix, - spatial_network = spatial_networks[[k]], - bin_method = bin_method, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - percentage_rank = rank_i, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - # set.seed = set.seed, - seed = seed) - - result_list[[i]] = result - i = i+1 - } - } - combined_result = data.table::rbindlist(result_list) - - - } else if(bin_method == 'kmeans') { - - ## bin method kmeans - total_trials = length(spatial_networks) - result_list = vector(mode = 'list', length = total_trials) - i = 1 - - - # pre-calculate bin_matrix once - bin_matrix = kmeans_binarize_wrapper(expr_values = expression_matrix, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - # set.seed = set.seed, - seed = seed) - - for(k in seq_along(spatial_networks)) { - - if(verbose == TRUE) cat('\n Run for spatial network = ', k,'\n') - - result = binSpectSingleMatrix(expression_matrix = expression_matrix, - bin_matrix = bin_matrix, - spatial_network = spatial_networks[[k]], - bin_method = bin_method, - subset_feats = subset_feats, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - # set.seed = set.seed, - seed = seed) - - result_list[[i]] = result - i = i+1 - +binSpectMultiMatrix <- function(expression_matrix, + spatial_networks, + bin_method = c("kmeans", "rank"), + subset_feats = NULL, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + percentage_rank = c(10, 30), + do_fisher_test = TRUE, + adjust_method = "fdr", + calc_hub = FALSE, + hub_min_int = 3, + get_av_expr = TRUE, + get_high_expr = TRUE, + implementation = c("data.table", "simple", "matrix"), + group_size = "automatic", + do_parallel = TRUE, + cores = NA, + verbose = TRUE, + knn_params = NULL, + set.seed = deprecated(), + seed = 1234, + summarize = c("adj.p.value", "p.value")) { + if (is_present(set.seed) && !is.function(set.seed)) { + deprecate_warn( + when = "4.0.3", + what = "binSpectMultiMatrix(set.seed)", + with = "binSpectMultiMatrix(seed)" + ) + + seed <- set.seed + set.seed <- NULL } - combined_result = data.table::rbindlist(result_list) - } + if (verbose == TRUE) + message("This is the multi parameter version of binSpect") + + # set number of cores automatically, but with limit of 10 + cores <- determine_cores(cores) + data.table::setDTthreads(threads = cores) + + # check bin_method + bin_method <- match.arg(bin_method, choices = c("kmeans", "rank")) + + # summarization level + summarize <- match.arg(summarize, choices = c("adj.p.value", "p.value")) + + + ## bin method rank + if (bin_method == "rank") { + total_trials <- length(spatial_networks) * length(percentage_rank) + result_list <- vector(mode = "list", length = total_trials) + i <- 1 + + for (k in seq_along(spatial_networks)) { + for (rank_i in percentage_rank) { + if (verbose == TRUE) + cat("Run for spatial network ", k, " and rank % = ", rank_i) + + result <- binSpectSingleMatrix( + expression_matrix = expression_matrix, + spatial_network = spatial_networks[[k]], + bin_method = bin_method, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + percentage_rank = rank_i, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + # set.seed = set.seed, + seed = seed + ) + + result_list[[i]] <- result + i <- i + 1 + } + } + combined_result <- data.table::rbindlist(result_list) + } else if (bin_method == "kmeans") { + ## bin method kmeans + total_trials <- length(spatial_networks) + result_list <- vector(mode = "list", length = total_trials) + i <- 1 + + + # pre-calculate bin_matrix once + bin_matrix <- kmeans_binarize_wrapper( + expr_values = expression_matrix, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + # set.seed = set.seed, + seed = seed + ) + + for (k in seq_along(spatial_networks)) { + if (verbose == TRUE) cat("Run for spatial network = ", k) + + result <- binSpectSingleMatrix( + expression_matrix = expression_matrix, + bin_matrix = bin_matrix, + spatial_network = spatial_networks[[k]], + bin_method = bin_method, + subset_feats = subset_feats, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + # set.seed = set.seed, + seed = seed + ) + + result_list[[i]] <- result + i <- i + 1 + } + combined_result <- data.table::rbindlist(result_list) + } - # data.table variables - feats = V1 = p.val = NULL - ## merge results into 1 p-value per feat ## - simple_result = combined_result[, sum(log(get(summarize))), by = feats] - simple_result[, V1 := V1*-2] - simple_result[, p.val := stats::pchisq(q = V1, df = total_trials, log.p = F, lower.tail = F)] + # data.table variables + feats <- V1 <- p.val <- NULL - return(list(combined = combined_result, simple = simple_result[,.(feats, p.val)])) + ## merge results into 1 p-value per feat ## + simple_result <- combined_result[, sum(log(get(summarize))), by = feats] + simple_result[, V1 := V1 * -2] + simple_result[, p.val := stats::pchisq( + q = V1, df = total_trials, log.p = FALSE, lower.tail = FALSE)] + return(list( + combined = combined_result, simple = simple_result[, .(feats, p.val)])) } @@ -1482,7 +1550,8 @@ binSpectMultiMatrix = function(expression_matrix, #' @title binSpect #' @name binSpect -#' @description Previously: binGetSpatialGenes. BinSpect (Binary Spatial Extraction of genes) is a fast computational method +#' @description Previously: binGetSpatialGenes. BinSpect +#' (Binary Spatial Extraction of genes) is a fast computational method #' that identifies genes with a spatially coherent expression pattern. #' @param gobject giotto object #' @param spat_unit spatial unit @@ -1491,23 +1560,28 @@ binSpectMultiMatrix = function(expression_matrix, #' @param expression_values expression values to use #' @param subset_feats only select a subset of features to test #' @param subset_genes deprecated, use subset_feats -#' @param spatial_network_name name of spatial network to use (default = 'spatial_network') +#' @param spatial_network_name name of spatial network to use +#' (default = 'spatial_network') #' @param spatial_network_k different k's for a spatial kNN to evaluate #' @param reduce_network default uses the full network -#' @param kmeans_algo kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset) +#' @param kmeans_algo kmeans algorithm to use +#' (kmeans, kmeans_arma, kmeans_arma_subset) #' @param nstart kmeans: nstart parameter #' @param iter_max kmeans: iter.max parameter #' @param extreme_nr number of top and bottom cells (see details) #' @param sample_nr total number of cells to sample (see details) #' @param percentage_rank percentage of top cells for binarization #' @param do_fisher_test perform fisher test -#' @param adjust_method p-value adjusted method to use (see \code{\link[stats]{p.adjust}}) +#' @param adjust_method p-value adjusted method to use +#' (see \code{\link[stats]{p.adjust}}) #' @param calc_hub calculate the number of hub cells #' @param hub_min_int minimum number of cell-cell interactions for a hub cell -#' @param get_av_expr calculate the average expression per gene of the high expressing cells +#' @param get_av_expr calculate the average expression per gene of the high +#' expressing cells #' @param get_high_expr calculate the number of high expressing cells per gene #' @param implementation enrichment implementation (data.table, simple, matrix) -#' @param group_size number of genes to process together with data.table implementation (default = automatic) +#' @param group_size number of genes to process together with data.table +#' implementation (default = automatic) #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param verbose be verbose @@ -1515,13 +1589,15 @@ binSpectMultiMatrix = function(expression_matrix, #' @param set.seed deprecated. Use \code{seed} param instead #' @param seed seed for kmeans binarization. When \code{NULL}, no seed is set. #' Otherwise, accepts a numeric input that will be used as seed. -#' @param bin_matrix a binarized matrix, when provided it will skip the binarization process +#' @param bin_matrix a binarized matrix, when provided it will skip the +#' binarization process #' @param summarize summarize the p-values or adjusted p-values #' @param return_gobject whether to return values attached to the gobject or #' separately (default) #' @return data.table with results (see details) -#' @details We provide two ways to identify spatial genes based on gene expression binarization. -#' Both methods are identicial except for how binarization is performed. +#' @details We provide two ways to identify spatial genes based on gene +#' expression binarization. +#' Both methods are identical except for how binarization is performed. #' \itemize{ #' \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} #' \item{2. network: }{Alll cells are connected through a spatial network based on the physical coordinates} @@ -1541,145 +1617,146 @@ binSpectMultiMatrix = function(expression_matrix, #' \item{Average expression of each gene within high expressing cells } #' \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} #' } -#' By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -#' The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -#' The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +#' By selecting a subset of likely spatial genes +#' (e.g. soft thresholding highly variable genes) can accelerate the speed. +#' The simple implementation is usually faster, but lacks the possibility to +#' run in parallel and to calculate hub cells. +#' The data.table implementation might be more appropriate for large datasets +#' by setting the group_size (number of genes) parameter to divide the workload. #' @export -binSpect = function(gobject, - spat_unit = NULL, - feat_type = NULL, - bin_method = c('kmeans', 'rank'), - expression_values = c('normalized', 'scaled', 'custom'), - subset_feats = NULL, - subset_genes = NULL, - spatial_network_name = 'Delaunay_network', - spatial_network_k = NULL, - reduce_network = FALSE, - kmeans_algo = c('kmeans', 'kmeans_arma', 'kmeans_arma_subset'), - nstart = 3, - iter_max = 10, - extreme_nr = 50, - sample_nr = 50, - percentage_rank = 30, - do_fisher_test = TRUE, - adjust_method = 'fdr', - calc_hub = FALSE, - hub_min_int = 3, - get_av_expr = TRUE, - get_high_expr = TRUE, - implementation = c('data.table', 'simple', 'matrix'), - group_size = 'automatic', - do_parallel = TRUE, - cores = NA, - verbose = TRUE, - knn_params = NULL, - set.seed = deprecated(), - seed = 1234, - bin_matrix = NULL, - summarize = c('p.value', 'adj.p.value'), - return_gobject = FALSE) { - - # TODO align set.seed, set_seed, seed_number naming and usage across packages - # use only param seed. If NULL, set no seed. If !NULL set value as seed - - if (is_present(set.seed) && !is.function(set.seed)) { - deprecate_warn( - when = "4.0.3", - what = "binSpect(set.seed)", - with = "binSpect(seed)" - ) +binSpect <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + bin_method = c("kmeans", "rank"), + expression_values = c("normalized", "scaled", "custom"), + subset_feats = NULL, + subset_genes = NULL, + spatial_network_name = "Delaunay_network", + spatial_network_k = NULL, + reduce_network = FALSE, + kmeans_algo = c("kmeans", "kmeans_arma", "kmeans_arma_subset"), + nstart = 3, + iter_max = 10, + extreme_nr = 50, + sample_nr = 50, + percentage_rank = 30, + do_fisher_test = TRUE, + adjust_method = "fdr", + calc_hub = FALSE, + hub_min_int = 3, + get_av_expr = TRUE, + get_high_expr = TRUE, + implementation = c("data.table", "simple", "matrix"), + group_size = "automatic", + do_parallel = TRUE, + cores = NA, + verbose = TRUE, + knn_params = NULL, + set.seed = deprecated(), + seed = 1234, + bin_matrix = NULL, + summarize = c("p.value", "adj.p.value"), + return_gobject = FALSE) { + # TODO align set.seed, set_seed, seed_number naming and usage across + # packages + # use only param seed. If NULL, set no seed. If !NULL set value as seed + + if (is_present(set.seed) && !is.function(set.seed)) { + deprecate_warn( + when = "4.0.3", + what = "binSpect(set.seed)", + with = "binSpect(seed)" + ) + + seed <- set.seed + set.seed <- NULL + } - seed <- set.seed - set.seed <- NULL - } - - if(!is.null(spatial_network_k)) { - - output = binSpectMulti(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - bin_method = bin_method, - expression_values = expression_values, - subset_feats = subset_feats, - subset_genes = subset_genes, - spatial_network_k = spatial_network_k, - reduce_network = reduce_network, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - percentage_rank = percentage_rank, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - knn_params = knn_params, - seed = seed, - # set.seed = set.seed, - summarize = summarize) - - } else { - - output = binSpectSingle(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - bin_method = bin_method, - expression_values = expression_values, - subset_feats = subset_feats, - subset_genes = subset_genes, - spatial_network_name = spatial_network_name, - reduce_network = reduce_network, - kmeans_algo = kmeans_algo, - nstart = nstart, - iter_max = iter_max, - extreme_nr = extreme_nr, - sample_nr = sample_nr, - percentage_rank = percentage_rank, - do_fisher_test = do_fisher_test, - adjust_method = adjust_method, - calc_hub = calc_hub, - hub_min_int = hub_min_int, - get_av_expr = get_av_expr, - get_high_expr = get_high_expr, - implementation = implementation, - group_size = group_size, - do_parallel = do_parallel, - cores = cores, - verbose = verbose, - seed = seed, - # set.seed = set.seed, - bin_matrix = bin_matrix) - - } - - #return(output) - - if(return_gobject==TRUE){ - #if("binSpect.pval" %in% names(fDataDT(gobject, spat_unit = spat_unit, feat_type = feat_type))){ - # removeFeatAnnotation(gobject, spat_unit = spat_unit, feat_type = feat_type, columns=c("binSpect.pval")) - #} - result_dt = data.table::data.table(feats=output$feats, pval=output$adj.p.value) - data.table::setnames(result_dt, old = "pval", new = "binSpect.pval") - gobject <- addFeatMetadata( - gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = result_dt, - by_column = TRUE, - column_feat_ID = "feats" - ) - return(gobject) - }else{ - return(output) - } + if (!is.null(spatial_network_k)) { + output <- binSpectMulti( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + bin_method = bin_method, + expression_values = expression_values, + subset_feats = subset_feats, + subset_genes = subset_genes, + spatial_network_k = spatial_network_k, + reduce_network = reduce_network, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + percentage_rank = percentage_rank, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + knn_params = knn_params, + seed = seed, + summarize = summarize + ) + } else { + output <- binSpectSingle( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + bin_method = bin_method, + expression_values = expression_values, + subset_feats = subset_feats, + subset_genes = subset_genes, + spatial_network_name = spatial_network_name, + reduce_network = reduce_network, + kmeans_algo = kmeans_algo, + nstart = nstart, + iter_max = iter_max, + extreme_nr = extreme_nr, + sample_nr = sample_nr, + percentage_rank = percentage_rank, + do_fisher_test = do_fisher_test, + adjust_method = adjust_method, + calc_hub = calc_hub, + hub_min_int = hub_min_int, + get_av_expr = get_av_expr, + get_high_expr = get_high_expr, + implementation = implementation, + group_size = group_size, + do_parallel = do_parallel, + cores = cores, + verbose = verbose, + seed = seed, + # set.seed = set.seed, + bin_matrix = bin_matrix + ) + } + + # return(output) + + if (return_gobject == TRUE) { + + result_dt <- data.table::data.table( + feats = output$feats, pval = output$adj.p.value) + data.table::setnames(result_dt, old = "pval", new = "binSpect.pval") + gobject <- addFeatMetadata( + gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = result_dt, + by_column = TRUE, + column_feat_ID = "feats" + ) + return(gobject) + } else { + return(output) + } } @@ -1687,11 +1764,15 @@ binSpect = function(gobject, #' @title silhouetteRank #' @name silhouetteRank -#' @description Previously: calculate_spatial_genes_python. This method computes a silhouette score per gene based on the -#' spatial distribution of two partitions of cells (expressed L1, and non-expressed L0). -#' Here, rather than L2 Euclidean norm, it uses a rank-transformed, exponentially weighted +#' @description Previously: calculate_spatial_genes_python. This method +#' computes a silhouette score per gene based on the +#' spatial distribution of two partitions of cells +#' (expressed L1, and non-expressed L0). +#' Here, rather than L2 Euclidean norm, it uses a rank-transformed, +#' exponentially weighted #' function to represent the local physical distance between two cells. -#' New multi aggregator implementation can be found at \code{\link{silhouetteRankTest}} +#' New multi aggregator implementation can be found at +#' \code{\link{silhouetteRankTest}} #' @param gobject giotto object #' @param expression_values expression values to use #' @param metric distance metric to use @@ -1702,69 +1783,69 @@ binSpect = function(gobject, #' @return data.table with spatial scores #' @export silhouetteRank <- function(gobject, - expression_values = c('normalized', 'scaled', 'custom'), - metric = "euclidean", - subset_genes = NULL, - rbp_p = 0.95, - examine_top = 0.3, - python_path = NULL) { - - - # expression values - values = match.arg(expression_values, c('normalized', 'scaled', 'custom')) - expr_values = getExpression(gobject = gobject, - values = values, - output = "matrix") - - # subset genes - if(!is.null(subset_genes)) { - - subset_genes = subset_genes[subset_genes %in% gobject@feat_ID] - expr_values = expr_values[rownames(expr_values) %in% subset_genes, ] - } - - - # data.table variables - sdimx = sdimy = NULL - - # spatial locations - # spatlocs = as.matrix(gobject@spatial_locs[['cell']][['raw']][,.(sdimx, sdimy)]) - spatlocs = getSpatialLocations(gobject, - spat_unit = 'cell', - name = 'raw', - output = 'data.table', - copy_obj = TRUE) - spatlocs = as.matrix(spatlocs[,.(sdimx, sdimy)]) + expression_values = c("normalized", "scaled", "custom"), + metric = "euclidean", + subset_genes = NULL, + rbp_p = 0.95, + examine_top = 0.3, + python_path = NULL) { + # expression values + values <- match.arg(expression_values, c("normalized", "scaled", "custom")) + expr_values <- getExpression( + gobject = gobject, + values = values, + output = "matrix" + ) - # python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } + # subset genes + if (!is.null(subset_genes)) { + subset_genes <- subset_genes[subset_genes %in% gobject@feat_ID] + expr_values <- expr_values[rownames(expr_values) %in% subset_genes, ] + } - ## prepare python path and louvain script - reticulate::use_python(required = T, python = python_path) - python_silh_function = system.file("python", "python_spatial_genes.py", package = 'Giotto') - reticulate::source_python(file = python_silh_function) - output_python = python_spatial_genes(spatial_locations = spatlocs, - expression_matrix = as.data.frame(as.matrix(expr_values)), - metric = metric, - rbp_p = rbp_p, - examine_top = examine_top) + # data.table variables + sdimx <- sdimy <- NULL + + # spatial locations + spatlocs <- getSpatialLocations(gobject, + spat_unit = "cell", + name = "raw", + output = "data.table", + copy_obj = TRUE + ) + spatlocs <- as.matrix(spatlocs[, .(sdimx, sdimy)]) - # unlist output - genes = unlist(lapply(output_python, FUN = function(x) { - y = x[1][[1]] - })) - scores = unlist(lapply(output_python, FUN = function(x) { - y = x[2][[1]] - })) + # python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") + } - spatial_python_DT = data.table::data.table(genes = genes, scores = scores) + ## prepare python path and louvain script + reticulate::use_python(required = TRUE, python = python_path) + python_silh_function <- system.file( + "python", "python_spatial_genes.py", package = "Giotto") + reticulate::source_python(file = python_silh_function) + + output_python <- python_spatial_genes( + spatial_locations = spatlocs, + expression_matrix = as.data.frame(as.matrix(expr_values)), + metric = metric, + rbp_p = rbp_p, + examine_top = examine_top + ) - return(spatial_python_DT) + # unlist output + genes <- unlist(lapply(output_python, FUN = function(x) { + y <- x[1][[1]] + })) + scores <- unlist(lapply(output_python, FUN = function(x) { + y <- x[2][[1]] + })) + spatial_python_DT <- data.table::data.table(genes = genes, scores = scores) + return(spatial_python_DT) } @@ -1772,7 +1853,8 @@ silhouetteRank <- function(gobject, #' @title silhouetteRankTest #' @name silhouetteRankTest -#' @description Multi parameter aggregator version of \code{\link{silhouetteRank}} +#' @description Multi parameter aggregator version of +#' \code{\link{silhouetteRank}} #' @param gobject giotto object #' @param expression_values expression values to use #' @param subset_genes only run on this subset of genes @@ -1787,40 +1869,39 @@ silhouetteRank <- function(gobject, #' @param verbose be verbose #' @return data.table with spatial scores #' @export -silhouetteRankTest = function(gobject, - expression_values = c('normalized', 'scaled', 'custom'), - subset_genes = NULL, - overwrite_input_bin = TRUE, - rbp_ps = c(0.95, 0.99), - examine_tops = c(0.005, 0.010, 0.050, 0.100, 0.300), - matrix_type = "dissim", - num_core = 4, - parallel_path = "/usr/bin", - output = NULL, - query_sizes = 10L, - verbose = FALSE) { - - - # data.table variables - cell_ID = sdimx = sdimy = sdimz = NULL - - ## test if R packages are installed - # check envstats - package_check(pkg_name = 'EnvStats', repository = c('CRAN')) - - # check eva - if ('eva' %in% rownames(installed.packages()) == FALSE) { - stop("\n package ", 'eva', " is not yet installed \n", - "To install: \n", - "install.packages('eva')") - # "install.packages('eva_0.2.5.tar.gz', repos=NULL, type='source')", - # "see https://cran.r-project.org/src/contrib/Archive/eva/") - } - - ## test if python package is installed - module_test = reticulate::py_module_available('silhouetteRank') - if(module_test == FALSE) { - warning("silhouetteRank python module is not installed: +silhouetteRankTest <- function(gobject, + expression_values = c("normalized", "scaled", "custom"), + subset_genes = NULL, + overwrite_input_bin = TRUE, + rbp_ps = c(0.95, 0.99), + examine_tops = c(0.005, 0.010, 0.050, 0.100, 0.300), + matrix_type = "dissim", + num_core = 4, + parallel_path = "/usr/bin", + output = NULL, + query_sizes = 10L, + verbose = FALSE) { + # data.table variables + cell_ID <- sdimx <- sdimy <- sdimz <- NULL + + ## test if R packages are installed + # check envstats + package_check(pkg_name = "EnvStats", repository = c("CRAN")) + + # check eva + if ("eva" %in% rownames(installed.packages()) == FALSE) { + stop( + "\n package ", "eva", " is not yet installed \n", + "To install: \n", + "install.packages('eva')" + ) + + } + + ## test if python package is installed + module_test <- reticulate::py_module_available("silhouetteRank") + if (module_test == FALSE) { + warning("silhouetteRank python module is not installed: install in the right environment or python path with: 'pip install silhouetteRank' @@ -1836,107 +1917,107 @@ silhouetteRankTest = function(gobject, conda = conda_full_path, pip = TRUE, python_version = '3.6')") - } + } - # expression values - values = match.arg(expression_values, c('normalized', 'scaled', 'custom')) - expr_values = getExpression(gobject = gobject, - values = values, - output = 'matrix') + # expression values + values <- match.arg(expression_values, c("normalized", "scaled", "custom")) + expr_values <- getExpression( + gobject = gobject, + values = values, + output = "matrix" + ) - # subset genes - if(!is.null(subset_genes)) { + # subset genes + if (!is.null(subset_genes)) { + subset_genes <- subset_genes[subset_genes %in% gobject@gene_ID] + expr_values <- expr_values[rownames(expr_values) %in% subset_genes, ] + } - subset_genes = subset_genes[subset_genes %in% gobject@gene_ID] - expr_values = expr_values[rownames(expr_values) %in% subset_genes, ] + # spatial locations + spatlocs <- getSpatialLocations(gobject, + name = "raw", + output = "data.table", + copy_obj = TRUE + ) - } - - # spatial locations - # spatlocs = gobject@spatial_locs[['raw']] - spatlocs = get_spatial_locations(gobject, - spat_loc_name = 'raw', - output = 'data.table', - copy_obj = TRUE) - - ## save dir and log - if(is.null(output)) { - - save_dir = readGiottoInstructions(gobject, param = "save_dir") - silh_output_dir = paste0(save_dir, '/', 'silhouetteRank_output/') - if(!file.exists(silh_output_dir)) dir.create(silh_output_dir, recursive = TRUE) - - } else if(file.exists(output)) { - - silh_output_dir = paste0(output, '/', 'silhouetteRank_output/') - if(!file.exists(silh_output_dir)) dir.create(silh_output_dir, recursive = TRUE) - - } else { - - silh_output_dir = paste0(output, '/', 'silhouetteRank_output/') - if(!file.exists(silh_output_dir)) dir.create(silh_output_dir, recursive = TRUE) - - } - - # log directory - log_dir = paste0(silh_output_dir, '/', 'logs/') - if(!file.exists(log_dir)) dir.create(log_dir, recursive = TRUE) - - - ## write spatial locations to .txt file - if(ncol(spatlocs) == 3) { - format_spatlocs = spatlocs[,.(cell_ID, sdimx, sdimy)] - colnames(format_spatlocs) = c('ID', 'x', 'y') - } else { - format_spatlocs = spatlocs[,.(cell_ID, sdimx, sdimy, sdimz)] - colnames(format_spatlocs) = c('ID', 'x', 'y', 'z') - } - - write.table(x = format_spatlocs, row.names = F, - file = paste0(silh_output_dir,'/', 'format_spatlocs.txt'), - quote = F, sep = '\t') + ## save dir and log + if (is.null(output)) { + save_dir <- readGiottoInstructions(gobject, param = "save_dir") + silh_output_dir <- paste0(save_dir, "/", "silhouetteRank_output/") + if (!file.exists(silh_output_dir)) dir.create( + silh_output_dir, recursive = TRUE) + } else if (file.exists(output)) { + silh_output_dir <- paste0(output, "/", "silhouetteRank_output/") + if (!file.exists(silh_output_dir)) dir.create( + silh_output_dir, recursive = TRUE) + } else { + silh_output_dir <- paste0(output, "/", "silhouetteRank_output/") + if (!file.exists(silh_output_dir)) dir.create( + silh_output_dir, recursive = TRUE) + } - spatlocs_path = paste0(silh_output_dir,'/', 'format_spatlocs.txt') + # log directory + log_dir <- paste0(silh_output_dir, "/", "logs/") + if (!file.exists(log_dir)) dir.create(log_dir, recursive = TRUE) - ## write expression to .txt file - #write.table(x = as.matrix(expr_values), - # file = paste0(silh_output_dir,'/', 'expression.txt'), - # quote = F, sep = '\t', col.names=NA) - silh_output_dir_norm = normalizePath(silh_output_dir) - expr_values_path_norm = paste0(silh_output_dir_norm,'/', 'expression.txt') - data.table::fwrite(data.table::as.data.table(expr_values, keep.rownames="gene"), - file=expr_values_path_norm, - quot=F, - sep="\t", - col.names=T, - row.names=F) + ## write spatial locations to .txt file + if (ncol(spatlocs) == 3) { + format_spatlocs <- spatlocs[, .(cell_ID, sdimx, sdimy)] + colnames(format_spatlocs) <- c("ID", "x", "y") + } else { + format_spatlocs <- spatlocs[, .(cell_ID, sdimx, sdimy, sdimz)] + colnames(format_spatlocs) <- c("ID", "x", "y", "z") + } - expr_values_path = paste0(silh_output_dir,'/', 'expression.txt') + write.table( + x = format_spatlocs, row.names = FALSE, + file = paste0(silh_output_dir, "/", "format_spatlocs.txt"), + quote = FALSE, sep = "\t" + ) - ## prepare python path and louvain script - python_path = readGiottoInstructions(gobject, param = 'python_path') - reticulate::use_python(required = T, python = python_path) - python_silh_function = system.file("python", "silhouette_rank_wrapper.py", package = 'Giotto') - reticulate::source_python(file = python_silh_function) + spatlocs_path <- paste0(silh_output_dir, "/", "format_spatlocs.txt") + + silh_output_dir_norm <- normalizePath(silh_output_dir) + expr_values_path_norm <- paste0(silh_output_dir_norm, "/", "expression.txt") - output_silh = silhouette_rank(expr = expr_values_path, - centroid = spatlocs_path, - overwrite_input_bin = overwrite_input_bin, - rbp_ps = rbp_ps, - examine_tops = examine_tops, - matrix_type = matrix_type, - verbose = verbose, - num_core = num_core, - parallel_path = parallel_path, - output = silh_output_dir, - query_sizes = as.integer(query_sizes)) + data.table::fwrite(data.table::as.data.table( + expr_values, keep.rownames = "gene"), + file = expr_values_path_norm, + quote = FALSE, + sep = "\t", + col.names = TRUE, + row.names = FALSE + ) - return(output_silh) + expr_values_path <- paste0(silh_output_dir, "/", "expression.txt") + + ## prepare python path and louvain script + python_path <- readGiottoInstructions(gobject, param = "python_path") + reticulate::use_python(required = TRUE, python = python_path) + python_silh_function <- system.file( + "python", "silhouette_rank_wrapper.py", package = "Giotto") + reticulate::source_python(file = python_silh_function) + + + output_silh <- silhouette_rank( + expr = expr_values_path, + centroid = spatlocs_path, + overwrite_input_bin = overwrite_input_bin, + rbp_ps = rbp_ps, + examine_tops = examine_tops, + matrix_type = matrix_type, + verbose = verbose, + num_core = num_core, + parallel_path = parallel_path, + output = silh_output_dir, + query_sizes = as.integer(query_sizes) + ) + return(output_silh) } @@ -1960,35 +2041,35 @@ silhouetteRankTest = function(gobject, #' @param show_plot show plot #' @param return_plot return ggplot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @return a list of data.frames with results and plot (optional) -#' @details This function is a wrapper for the SpatialDE method originally implemented +#' @details This function is a wrapper for the SpatialDE method originally +#' implemented #' in python. See publication \doi{10.1038/nmeth.4636} #' @export spatialDE <- function(gobject = NULL, - feat_type = NULL, - spat_unit = NULL, - spat_loc_name = 'raw', - expression_values = c('raw', 'normalized', 'scaled', 'custom'), - size = c(4,2,1), - color = c("blue", "green", "red"), - sig_alpha = 0.5, - unsig_alpha = 0.5, - python_path = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'SpatialDE'){ - - - - # test if SPARK is installed ## - - module_test = reticulate::py_module_available('SpatialDE') - if(module_test == FALSE) { - warning("SpatialDE python module is not installed: + feat_type = NULL, + spat_unit = NULL, + spat_loc_name = "raw", + expression_values = c("raw", "normalized", "scaled", "custom"), + size = c(4, 2, 1), + color = c("blue", "green", "red"), + sig_alpha = 0.5, + unsig_alpha = 0.5, + python_path = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "SpatialDE") { + # test if SPARK is installed ## + + module_test <- reticulate::py_module_available("SpatialDE") + if (module_test == FALSE) { + warning("SpatialDE python module is not installed: install in the right environment or python path with: 'pip install spatialde' @@ -2004,104 +2085,123 @@ spatialDE <- function(gobject = NULL, conda = conda_full_path, pip = TRUE, python_version = '3.6')") - } - - - # print message with information # - message("using 'SpatialDE' for spatial gene/pattern detection. If used in published research, please cite: - Svensson, Valentine, Sarah A. Teichmann, and Oliver Stegle. 'SpatialDE: Identification of Spatially Variable Genes.' - Nature Methods 15, no. 5 (May 2018): 343-46. https://doi.org/10.1038/nmeth.4636.") - - - - # data.table variables - cell_ID = NULL - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression - values = match.arg(expression_values, c('raw', 'normalized', 'scaled', 'custom')) - expr_values = getExpression(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = "matrix") - - ## python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } - - ## source python file - reticulate::use_python(required = T, python = python_path) - reader_path = system.file("python", "SpatialDE_wrapper.py", package = 'Giotto') - reticulate::source_python(file = reader_path) - - ## get spatial locations - spatial_locs = getSpatialLocations(gobject, - spat_unit = spat_unit, - name = spat_loc_name, - output = "data.table") - spatial_locs <- as.data.frame(spatial_locs) - rownames(spatial_locs) <- spatial_locs$cell_ID - spatial_locs <- subset(spatial_locs, select = -cell_ID) - - ## run spatialDE - Spatial_DE_results = Spatial_DE(as.data.frame(t(as.matrix(expr_values))), spatial_locs) - - results <- as.data.frame(reticulate::py_to_r(Spatial_DE_results[[1]])) - - if(length(Spatial_DE_results) == 2){ - ms_results = as.data.frame(reticulate::py_to_r(Spatial_DE_results[[2]])) - spatial_genes_results = list(results, ms_results) - names(spatial_genes_results) = c("results", "ms_results") - } else{ - spatial_genes_results = results - ms_results = NULL - } - - - # print, return and save parameters - show_plot = ifelse(is.na(show_plot), readGiottoInstructions(gobject, param = 'show_plot'), show_plot) - save_plot = ifelse(is.na(save_plot), readGiottoInstructions(gobject, param = 'save_plot'), save_plot) - return_plot = ifelse(is.na(return_plot), readGiottoInstructions(gobject, param = 'return_plot'), return_plot) - - ## create plot - if(isTRUE(show_plot) || - isTRUE(save_plot) || - isTRUE(return_plot)) { - FSV_plot = FSV_show( - results = results, - ms_results = ms_results, - size =size, - color = color, - sig_alpha = sig_alpha, - unsig_alpha = unsig_alpha + } + + + # print message with information # + message("using 'SpatialDE' for spatial gene/pattern detection. If used in + published research, please cite: + Svensson, Valentine, Sarah A. Teichmann, and Oliver Stegle. + 'SpatialDE: Identification of Spatially Variable Genes.' + Nature Methods 15, no. 5 (May 2018): 343-46. + https://doi.org/10.1038/nmeth.4636.") + + + + # data.table variables + cell_ID <- NULL + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - } - ## print plot - if(show_plot == TRUE) { - print(FSV_plot) - } + # expression + values <- match.arg( + expression_values, c("raw", "normalized", "scaled", "custom")) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = FSV_plot, default_save_name = default_save_name), save_param)) - } + ## python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") + } - ## return results and plot (optional) - if(return_plot == TRUE) { - return(list(results = spatial_genes_results, plot = FSV_plot)) - } else { - return(list(results = spatial_genes_results)) - } + ## source python file + reticulate::use_python(required = TRUE, python = python_path) + reader_path <- system.file( + "python", "SpatialDE_wrapper.py", package = "Giotto") + reticulate::source_python(file = reader_path) + + ## get spatial locations + spatial_locs <- getSpatialLocations(gobject, + spat_unit = spat_unit, + name = spat_loc_name, + output = "data.table" + ) + spatial_locs <- as.data.frame(spatial_locs) + rownames(spatial_locs) <- spatial_locs$cell_ID + spatial_locs <- subset(spatial_locs, select = -cell_ID) + ## run spatialDE + Spatial_DE_results <- Spatial_DE( + as.data.frame(t(as.matrix(expr_values))), spatial_locs) + + results <- as.data.frame(reticulate::py_to_r(Spatial_DE_results[[1]])) + + if (length(Spatial_DE_results) == 2) { + ms_results <- as.data.frame( + reticulate::py_to_r(Spatial_DE_results[[2]])) + spatial_genes_results <- list(results, ms_results) + names(spatial_genes_results) <- c("results", "ms_results") + } else { + spatial_genes_results <- results + ms_results <- NULL + } + + + # print, return and save parameters + show_plot <- ifelse(is.na(show_plot), readGiottoInstructions( + gobject, param = "show_plot"), show_plot) + save_plot <- ifelse(is.na(save_plot), readGiottoInstructions( + gobject, param = "save_plot"), save_plot) + return_plot <- ifelse(is.na(return_plot), readGiottoInstructions( + gobject, param = "return_plot"), return_plot) + + ## create plot + if (isTRUE(show_plot) || + isTRUE(save_plot) || + isTRUE(return_plot)) { + FSV_plot <- FSV_show( + results = results, + ms_results = ms_results, + size = size, + color = color, + sig_alpha = sig_alpha, + unsig_alpha = unsig_alpha + ) + } + + ## print plot + if (show_plot == TRUE) { + print(FSV_plot) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = FSV_plot, + default_save_name = default_save_name), save_param)) + } + + ## return results and plot (optional) + if (return_plot == TRUE) { + return(list(results = spatial_genes_results, plot = FSV_plot)) + } else { + return(list(results = spatial_genes_results)) + } } @@ -2120,84 +2220,94 @@ spatialDE <- function(gobject = NULL, #' @param python_path specify specific path to python if required #' @param return_gobject show plot #' @return An updated giotto object -#' @details This function is a wrapper for the SpatialAEH method implemented in the ... +#' @details This function is a wrapper for the SpatialAEH method +#' implemented in the ... #' @export spatialAEH <- function(gobject = NULL, - feat_type = NULL, - spat_unit = NULL, - spat_loc_name = 'raw', - SpatialDE_results = NULL, - name_pattern = 'AEH_patterns', - expression_values = c('raw', 'normalized', 'scaled', 'custom'), - pattern_num = 6, - l = 1.05, - python_path = NULL, - return_gobject = TRUE) { - - # data.table variables - cell_ID = NULL - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression - values = match.arg(expression_values, c('raw', 'normalized', 'scaled', 'custom')) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values) - - ## python path - if(is.null(python_path)) { - python_path = readGiottoInstructions(gobject, param = "python_path") - } - - ## source python file - reticulate::use_python(required = T, python = python_path) - reader_path = system.file("python", "SpatialDE_wrapper.py", package = 'Giotto') - reticulate::source_python(file = reader_path) - - - ## spatial locations - spatial_locs = get_spatial_locations(gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name) - spatial_locs <- as.data.frame(spatial_locs) - rownames(spatial_locs) <- spatial_locs$cell_ID - spatial_locs <- subset(spatial_locs, select = -cell_ID) - - # extract results you need - results = SpatialDE_results[['results']][['results']] - - ## automatic expression histology - AEH_results = Spatial_DE_AEH(filterd_exprs = as.data.frame(t_flex(as.matrix(expr_values))), - coordinates = spatial_locs, - results = as.data.frame(results), - pattern_num = pattern_num, - l = l) - histology_results <- as.data.frame(reticulate::py_to_r(AEH_results[[1]])) - cell_pattern_score <- as.data.frame((reticulate::py_to_r(AEH_results[[2]]))) - - spatial_pattern_results <- list(histology_results, cell_pattern_score) - names(spatial_pattern_results) <- c("histology_results","cell_pattern_score") - - - if(return_gobject == TRUE) { - - dt_res = data.table::as.data.table(spatial_pattern_results[['cell_pattern_score']]) - dt_res[['cell_ID']] = rownames(spatial_pattern_results[['cell_pattern_score']]) - gobject@spatial_enrichment[[name_pattern]] = dt_res - return(gobject) - - } else { - - return(list(results = spatial_pattern_results)) - - } + feat_type = NULL, + spat_unit = NULL, + spat_loc_name = "raw", + SpatialDE_results = NULL, + name_pattern = "AEH_patterns", + expression_values = c("raw", "normalized", "scaled", "custom"), + pattern_num = 6, + l = 1.05, + python_path = NULL, + return_gobject = TRUE) { + # data.table variables + cell_ID <- NULL + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # expression + values <- match.arg( + expression_values, c("raw", "normalized", "scaled", "custom")) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values + ) + + ## python path + if (is.null(python_path)) { + python_path <- readGiottoInstructions(gobject, param = "python_path") + } + + ## source python file + reticulate::use_python(required = TRUE, python = python_path) + reader_path <- system.file( + "python", "SpatialDE_wrapper.py", package = "Giotto") + reticulate::source_python(file = reader_path) + + + ## spatial locations + spatial_locs <- getSpatialLocations(gobject, + spat_unit = spat_unit, + name = spat_loc_name + ) + spatial_locs <- as.data.frame(spatial_locs) + rownames(spatial_locs) <- spatial_locs$cell_ID + spatial_locs <- subset(spatial_locs, select = -cell_ID) + + # extract results you need + results <- SpatialDE_results[["results"]][["results"]] + + ## automatic expression histology + AEH_results <- Spatial_DE_AEH( + filterd_exprs = as.data.frame(t_flex(as.matrix(expr_values))), + coordinates = spatial_locs, + results = as.data.frame(results), + pattern_num = pattern_num, + l = l + ) + histology_results <- as.data.frame(reticulate::py_to_r(AEH_results[[1]])) + cell_pattern_score <- as.data.frame(reticulate::py_to_r(AEH_results[[2]])) + + spatial_pattern_results <- list(histology_results, cell_pattern_score) + names(spatial_pattern_results) <- c( + "histology_results", "cell_pattern_score") + + + if (return_gobject == TRUE) { + dt_res <- data.table::as.data.table( + spatial_pattern_results[["cell_pattern_score"]]) + dt_res[["cell_ID"]] <- rownames( + spatial_pattern_results[["cell_pattern_score"]]) + gobject@spatial_enrichment[[name_pattern]] <- dt_res + return(gobject) + } else { + return(list(results = spatial_pattern_results)) + } } @@ -2213,51 +2323,61 @@ spatialAEH <- function(gobject = NULL, #' @return ggplot object #' @keywords internal FSV_show <- function(results, - ms_results = NULL, - size = c(4,2,1), - color = c("blue", "green", "red"), - sig_alpha = 0.5, - unsig_alpha = 0.5){ - - results$FSV95conf = 2 * sqrt(results$s2_FSV) - results$intervals <- cut(results$FSV95conf,c(0, 1e-1, 1e0, Inf),label = F) - results$log_pval <- log10(results$pval) - - if(is.null(ms_results)){ - results$model_bic = results$model - } - else{ - results = merge(results,ms_results[,c("g","model")],by.x = "g",by.y = "g",all.x = T, - suffixes=(c(" ",'_bic'))) - } - - results$model_bic <- factor(results$model_bic) - results$intervals <- factor(results$intervals) - - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::geom_point(data = results[results$qval < 0.05,], - ggplot2::aes_string(x = "FSV", y = "log_pval",fill = "model_bic",size = "intervals"), - show.legend = T, shape = 21,alpha = sig_alpha, - #size = size[results_cp_s$inftervals], - stroke = 0.1, color = "black") + - ggplot2::geom_point(data = results[results$qval > 0.05,], - ggplot2::aes_string(x = "FSV", y = "log_pval",size = "intervals"), - show.legend = T, shape = 21,alpha = unsig_alpha, - fill = "black", #size = size[results_cp_ns$inftervals], - stroke = 0.1, color = "black") + - ggplot2::scale_size_manual(values = size,guide=FALSE)+ - ggplot2::scale_color_manual(values = color)+ - ggplot2::scale_fill_discrete(name="Spatial Patterns", - breaks=c("linear", "PER", "SE"), - labels=c("linear", "periodical", "general"))+ - ggplot2::geom_hline(yintercept = max(results[results$qval < 0.05,]$log_pval),linetype = "dashed")+ - ggplot2::geom_text(ggplot2::aes(0.9,max(results[results$qval < 0.05,]$log_pval), - label = "FDR = 0.05", vjust = -1))+ - ggplot2::scale_y_reverse() - - print(pl) + ms_results = NULL, + size = c(4, 2, 1), + color = c("blue", "green", "red"), + sig_alpha = 0.5, + unsig_alpha = 0.5) { + results$FSV95conf <- 2 * sqrt(results$s2_FSV) + results$intervals <- cut( + results$FSV95conf, c(0, 1e-1, 1e0, Inf), label = FALSE) + results$log_pval <- log10(results$pval) + + if (is.null(ms_results)) { + results$model_bic <- results$model + } else { + results <- merge(results, ms_results[, c("g", "model")], + by.x = "g", by.y = "g", all.x = TRUE, + suffixes = (c(" ", "_bic")) + ) + } + + results$model_bic <- factor(results$model_bic) + results$intervals <- factor(results$intervals) + + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_point( + data = results[results$qval < 0.05, ], + ggplot2::aes_string( + x = "FSV", y = "log_pval", fill = "model_bic", size = "intervals"), + show.legend = TRUE, shape = 21, alpha = sig_alpha, + stroke = 0.1, color = "black" + ) + + ggplot2::geom_point( + data = results[results$qval > 0.05, ], + ggplot2::aes_string(x = "FSV", y = "log_pval", size = "intervals"), + show.legend = TRUE, shape = 21, alpha = unsig_alpha, + fill = "black", # size = size[results_cp_ns$inftervals], + stroke = 0.1, color = "black" + ) + + ggplot2::scale_size_manual(values = size, guide = FALSE) + + ggplot2::scale_color_manual(values = color) + + ggplot2::scale_fill_discrete( + name = "Spatial Patterns", + breaks = c("linear", "PER", "SE"), + labels = c("linear", "periodical", "general") + ) + + ggplot2::geom_hline(yintercept = max(results[ + results$qval < 0.05, ]$log_pval), linetype = "dashed") + + ggplot2::geom_text(ggplot2::aes(0.9, max(results[ + results$qval < 0.05, ]$log_pval), + label = "FDR = 0.05", vjust = -1 + )) + + ggplot2::scale_y_reverse() + + print(pl) } @@ -2272,86 +2392,101 @@ FSV_show <- function(results, #' @param spat_loc_name name for spatial locations #' @param expression_values gene expression values to use #' @param subset_genes subset of genes to run trendsceek on -#' @param nrand An integer specifying the number of random resamplings of the mark distribution as to create the null-distribution. -#' @param ncores An integer specifying the number of cores to be used by BiocParallel -#' @param \dots Additional parameters to the \code{\link[trendsceek]{trendsceek_test}} function +#' @param nrand An integer specifying the number of random resamplings of the +#' mark distribution as to create the null-distribution. +#' @param ncores An integer specifying the number of cores to be used by +#' BiocParallel +#' @param \dots Additional parameters to the +#' \code{\link[trendsceek]{trendsceek_test}} function #' @return data.frame with trendsceek spatial genes results -#' @details This function is a wrapper for the trendsceek_test method implemented in the trendsceek package +#' @details This function is a wrapper for the trendsceek_test method +#' implemented in the trendsceek package #' Publication: \doi{10.1038/nmeth.4634} #' @export trendSceek <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - spat_loc_name = 'raw', - expression_values = c("normalized", "raw"), - subset_genes = NULL, - nrand = 100, - ncores = 8, - ...) { - - # verify if optional package is installed - package_check(pkg_name = 'trendsceek', - repository = c('github'), - github_repo = 'edsgard/trendsceek') - - # print message with information # - message("using 'trendsceek' for spatial gene/pattern detection. If used in published research, please cite: - Edsgard, Daniel, Per Johnsson, and Rickard Sandberg. 'Identification of Spatial Expression Trends in Single-Cell Gene Expression Data.' - Nature Methods 15, no. 5 (May 2018): 339-42. https://doi.org/10.1038/nmeth.4634.") - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## expression data - values = match.arg(expression_values, c("normalized", "raw")) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values) - - ## normalization function - if (values == "normalized") { - log.fcn = NA - } - else if (values == "raw") { - log.fcn = log10 - } - - ## subset genes - if (!is.null(subset_genes)) { - subset_genes = subset_genes[subset_genes %in% gobject@gene_ID] - expr_values = expr_values[rownames(expr_values) %in% subset_genes, ] - } - - - ## initial locations - - # data.table variables - cell_ID = NULL - - spatial_locations = get_spatial_locations(gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name) - spatial_locations[, cell_ID := NULL] - pp = trendsceek::pos2pp(spatial_locations) - - ## initial gene counts - pp = trendsceek::set_marks(pp, as.matrix(expr_values), log.fcn = log.fcn) - - # eliminates running errors caused by too many zeros - pp[["marks"]] = pp[["marks"]] + 1e-7 - - ## run trendsceek - trendsceektest = trendsceek::trendsceek_test(pp, nrand = nrand, ncores = ncores, ...) - - ## get final results - trendsceektest = trendsceektest$supstats_wide - - return(trendsceektest) + feat_type = NULL, + spat_unit = NULL, + spat_loc_name = "raw", + expression_values = c("normalized", "raw"), + subset_genes = NULL, + nrand = 100, + ncores = 8, + ...) { + # verify if optional package is installed + package_check( + pkg_name = "trendsceek", + repository = c("github"), + github_repo = "edsgard/trendsceek" + ) + + # print message with information # + message("using 'trendsceek' for spatial gene/pattern detection. If used + in published research, please cite: + Edsgard, Daniel, Per Johnsson, and Rickard Sandberg. 'Identification of + Spatial Expression Trends in Single-Cell Gene Expression Data.' + Nature Methods 15, no. 5 (May 2018): 339-42. + https://doi.org/10.1038/nmeth.4634.") + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + ## expression data + values <- match.arg(expression_values, c("normalized", "raw")) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values + ) + + ## normalization function + if (values == "normalized") { + log.fcn <- NA + } else if (values == "raw") { + log.fcn <- log10 + } + + ## subset genes + if (!is.null(subset_genes)) { + subset_genes <- subset_genes[subset_genes %in% gobject@gene_ID] + expr_values <- expr_values[rownames(expr_values) %in% subset_genes, ] + } + + + ## initial locations + + # data.table variables + cell_ID <- NULL + + spatial_locations <- getSpatialLocations(gobject, + spat_unit = spat_unit, + name = spat_loc_name + ) + spatial_locations[, cell_ID := NULL] + pp <- trendsceek::pos2pp(spatial_locations) + + ## initial gene counts + pp <- trendsceek::set_marks(pp, as.matrix(expr_values), log.fcn = log.fcn) + + # eliminates running errors caused by too many zeros + pp[["marks"]] <- pp[["marks"]] + 1e-7 + + ## run trendsceek + trendsceektest <- trendsceek::trendsceek_test( + pp, nrand = nrand, ncores = ncores, ...) + + ## get final results + trendsceektest <- trendsceektest$supstats_wide + + return(trendsceektest) } @@ -2368,11 +2503,14 @@ trendSceek <- function(gobject, #' @param min_count minimum number of counts for a gene to be included #' @param expression_values type of values to use (raw by default) #' @param num_core number of cores to use -#' @param covariates The covariates in experiments, i.e. confounding factors/batch effect. Column name of giotto cell metadata. +#' @param covariates The covariates in experiments, i.e. confounding +#' factors/batch effect. Column name of giotto cell metadata. #' @param return_object type of result to return (data.table or spark object) -#' @param \dots Additional parameters to the \code{\link[SPARK]{spark.vc}} function +#' @param \dots Additional parameters to the \code{\link[SPARK]{spark.vc}} +#' function #' @return data.table with SPARK spatial genes results or the SPARK object -#' @details This function is a wrapper for the method implemented in the SPARK package: +#' @details This function is a wrapper for the method implemented in the +#' SPARK package: #' \itemize{ #' \item{1. CreateSPARKObject }{create a SPARK object from a Giotto object} #' \item{2. spark.vc }{ Fits the count-based spatial model to estimate the parameters, @@ -2381,112 +2519,126 @@ trendSceek <- function(gobject, #' } #' Publication: \doi{10.1101/810903} #' @export -spark = function(gobject, - spat_loc_name = 'raw', - feat_type = NULL, - spat_unit = NULL, - percentage = 0.1, - min_count = 10, - expression_values = 'raw', - num_core = 5, - covariates = NULL, - return_object = c('data.table', 'spark'), - ...) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # determine parameter - return_object = match.arg(return_object, c('data.table', 'spark')) - - # data.table variables - genes = adjusted_pvalue = combined_pvalue = NULL - - ## test if SPARK is installed ## - package_check(pkg_name = 'SPARK', - repository = c('github'), - github_repo = 'xzhoulab/SPARK') - - - # print message with information # - message("using 'SPARK' for spatial gene/pattern detection. If used in published research, please cite: - Sun, Shiquan, Jiaqiang Zhu, and Xiang Zhou. 'Statistical Analysis of Spatial Expression Pattern for Spatially Resolved Transcriptomic Studies.' - BioRxiv, October 21, 2019, 810903. https://doi.org/10.1101/810903.") - - - ## extract expression values from gobject - expr = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = expression_values) - - ## extract coordinates from gobject - locs = get_spatial_locations(gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name) - locs = as.data.frame(locs) - rownames(locs) = colnames(expr) - - ## create SPARK object for analysis and filter out lowly expressed genes - sobject = SPARK::CreateSPARKObject(counts = expr, - location = locs[,1:2], - percentage = percentage, - min_total_counts = min_count) - - ## total counts for each cell - sobject@lib_size = apply(sobject@counts, 2, sum) - - ## extract covariates ## - if(!is.null(covariates)) { - - # first filter giotto object based on spark object - filter_cell_ids = colnames(sobject@counts) - filter_gene_ids = rownames(sobject@counts) - tempgobject = subsetGiotto(gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cell_ids = filter_cell_ids, - feat_ids = filter_gene_ids) - - metadata = pDataDT(tempgobject) - - if(!covariates %in% colnames(metadata)) { - warning(covariates, ' was not found in the cell metadata of the giotto object, will be set to NULL \n') - covariates = NULL - } else { - covariates = metadata[[covariates]] +spark <- function(gobject, + spat_loc_name = "raw", + feat_type = NULL, + spat_unit = NULL, + percentage = 0.1, + min_count = 10, + expression_values = "raw", + num_core = 5, + covariates = NULL, + return_object = c("data.table", "spark"), + ...) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # determine parameter + return_object <- match.arg(return_object, c("data.table", "spark")) + + # data.table variables + genes <- adjusted_pvalue <- combined_pvalue <- NULL + + ## test if SPARK is installed ## + package_check( + pkg_name = "SPARK", + repository = c("github"), + github_repo = "xzhoulab/SPARK" + ) + + + # print message with information # + message("using 'SPARK' for spatial gene/pattern detection. If used in + published research, please cite: + Sun, Shiquan, Jiaqiang Zhu, and Xiang Zhou. 'Statistical Analysis of + Spatial Expression Pattern for Spatially Resolved Transcriptomic Studies.' + BioRxiv, October 21, 2019, 810903. https://doi.org/10.1101/810903.") + + + ## extract expression values from gobject + expr <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = expression_values + ) + + ## extract coordinates from gobject + locs <- getSpatialLocations(gobject, + spat_unit = spat_unit, + name = spat_loc_name + ) + locs <- as.data.frame(locs) + rownames(locs) <- colnames(expr) + + ## create SPARK object for analysis and filter out lowly expressed genes + sobject <- SPARK::CreateSPARKObject( + counts = expr, + location = locs[, 1:2], + percentage = percentage, + min_total_counts = min_count + ) + + ## total counts for each cell + sobject@lib_size <- apply(sobject@counts, 2, sum) + + ## extract covariates ## + if (!is.null(covariates)) { + # first filter giotto object based on spark object + filter_cell_ids <- colnames(sobject@counts) + filter_gene_ids <- rownames(sobject@counts) + tempgobject <- subsetGiotto(gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cell_ids = filter_cell_ids, + feat_ids = filter_gene_ids + ) + + metadata <- pDataDT(tempgobject) + + if (!covariates %in% colnames(metadata)) { + warning(covariates, " was not found in the cell metadata of the + giotto object, will be set to NULL") + covariates <- NULL + } else { + covariates <- metadata[[covariates]] + } + } + + ## Fit statistical model under null hypothesis + sobject <- SPARK::spark.vc(sobject, + covariates = covariates, + lib_size = sobject@lib_size, + num_core = num_core, + verbose = FALSE, + ... + ) + + ## test spatially expressed pattern genes + ## calculating pval + sobject <- SPARK::spark.test(sobject, + check_positive = TRUE, + verbose = FALSE + ) + + ## return results ## + if (return_object == "spark") { + return(sobject) + } else if (return_object == "data.table") { + DT_results <- data.table::as.data.table(sobject@res_mtest) + gene_names <- rownames(sobject@counts) + DT_results[, genes := gene_names] + data.table::setorder(DT_results, adjusted_pvalue, combined_pvalue) + return(DT_results) } - } - - ## Fit statistical model under null hypothesis - sobject = SPARK::spark.vc(sobject, - covariates = covariates, - lib_size = sobject@lib_size, - num_core = num_core, - verbose = F, - ...) - - ## test spatially expressed pattern genes - ## calculating pval - sobject = SPARK::spark.test(sobject, - check_positive = T, - verbose = F) - - ## return results ## - if(return_object == 'spark'){ - return(sobject) - }else if(return_object == 'data.table'){ - DT_results = data.table::as.data.table(sobject@res_mtest) - gene_names = rownames(sobject@counts) - DT_results[, genes := gene_names] - data.table::setorder(DT_results, adjusted_pvalue, combined_pvalue) - return(DT_results) - } } @@ -2498,10 +2650,12 @@ spark = function(gobject, #' @title detectSpatialPatterns #' @name detectSpatialPatterns -#' @description Identify spatial patterns through PCA on average expression in a spatial grid. +#' @description Identify spatial patterns through PCA on average expression +#' in a spatial grid. #' @param gobject giotto object #' @param expression_values expression values to use -#' @param spatial_grid_name name of spatial grid to use (default = 'spatial_grid') +#' @param spatial_grid_name name of spatial grid to use +#' (default = 'spatial_grid') #' @param min_cells_per_grid minimum number of cells in a grid to be considered #' @param scale_unit scale features #' @param ncp number of principal components to calculate @@ -2517,134 +2671,151 @@ spark = function(gobject, #' } #' @export detectSpatialPatterns <- function(gobject, - expression_values = c('normalized', 'scaled', 'custom'), - spatial_grid_name = 'spatial_grid', - min_cells_per_grid = 4, - scale_unit = F, - ncp = 100, - show_plot = T, - PC_zscore = 1.5) { -########################################################################################## - stop(wrap_txt( - "This function has not been updated for use with the current version of Giotto. + expression_values = c("normalized", "scaled", "custom"), + spatial_grid_name = "spatial_grid", + min_cells_per_grid = 4, + scale_unit = FALSE, + ncp = 100, + show_plot = TRUE, + PC_zscore = 1.5) { + ############################################################################ + stop(wrap_txt( + "This function has not been updated for use with the current version + of Giotto. See details: https://github.com/drieslab/Giotto/issues/666#issuecomment-1540447537", - errWidth = TRUE) - ) -########################################################################################## - # expression values to be used - values = match.arg(expression_values, c('normalized', 'scaled', 'custom')) - expr_values = get_expression_values(gobject = gobject, - values = values, - output = 'matrix') - - - # spatial grid and spatial locations - if(is.null(slot(gobject, 'spatial_grid'))) { - stop("\n you need to create a spatial grid, see createSpatialGrid(), for this function to work \n") - } - if(!spatial_grid_name %in% list_spatial_grids_names(gobject = gobject)) { - stop("\n you need to provide an existing spatial grid name for this function to work \n") - } - - #spatial_grid = gobject@spatial_grid[[spatial_grid_name]] - spatial_grid = get_spatialGrid(gobject, spatial_grid_name) - - # annotate spatial locations with spatial grid information - # spatial_locs = copy(gobject@spatial_locs[['raw']]) - spatial_locs = get_spatial_locations(gobject, - spat_loc_name = 'raw', - output = 'data.table', - copy_obj = TRUE) - - if(all(c('sdimx', 'sdimy', 'sdimz') %in% colnames(spatial_locs))) { - spatial_locs = annotate_spatlocs_with_spatgrid_3D(spatloc = spatial_locs, spatgrid = spatial_grid) - } else if(all(c('sdimx', 'sdimy') %in% colnames(spatial_locs))) { - spatial_locs = annotate_spatlocs_with_spatgrid_2D(spatloc = spatial_locs, spatgrid = spatial_grid) - } - - - # data.table variables - gr_loc = zscore = variance.percent = loc_ID = gene_ID = NULL - - # filter grid, minimum number of cells per grid - cells_per_grid = sort(table(spatial_locs$gr_loc)) - cells_per_grid = cells_per_grid[cells_per_grid >= min_cells_per_grid] - loc_names = names(cells_per_grid) - - # average expression per grid - loc_av_expr_list <- list() - for(loc_name in loc_names) { - - loc_cell_IDs = spatial_locs[gr_loc == loc_name]$cell_ID - subset_expr = expr_values[, colnames(expr_values) %in% loc_cell_IDs] - if(is.vector(subset_expr) == TRUE) { - loc_av_expr = subset_expr + errWidth = TRUE + )) + ############################################################################ + # expression values to be used + values <- match.arg(expression_values, c("normalized", "scaled", "custom")) + expr_values <- getExpression( + gobject = gobject, + values = values, + output = "matrix" + ) + + + # spatial grid and spatial locations + if (is.null(slot(gobject, "spatial_grid"))) { + stop("you need to create a spatial grid, see createSpatialGrid(), + for this function to work") + } + if (!spatial_grid_name %in% list_spatial_grids_names(gobject = gobject)) { + stop("you need to provide an existing spatial grid name for this + function to work") + } + + spatial_grid <- getSpatialGrid(gobject, name = spatial_grid_name) + + # annotate spatial locations with spatial grid information + spatial_locs <- getSpatialLocations(gobject, + name = "raw", + output = "data.table", + copy_obj = TRUE + ) + + if (all(c("sdimx", "sdimy", "sdimz") %in% colnames(spatial_locs))) { + spatial_locs <- annotate_spatlocs_with_spatgrid_3D( + spatloc = spatial_locs, spatgrid = spatial_grid) + } else if (all(c("sdimx", "sdimy") %in% colnames(spatial_locs))) { + spatial_locs <- annotate_spatlocs_with_spatgrid_2D( + spatloc = spatial_locs, spatgrid = spatial_grid) + } + + + # data.table variables + gr_loc <- zscore <- variance.percent <- loc_ID <- gene_ID <- NULL + + # filter grid, minimum number of cells per grid + cells_per_grid <- sort(table(spatial_locs$gr_loc)) + cells_per_grid <- cells_per_grid[cells_per_grid >= min_cells_per_grid] + loc_names <- names(cells_per_grid) + + # average expression per grid + loc_av_expr_list <- list() + for (loc_name in loc_names) { + loc_cell_IDs <- spatial_locs[gr_loc == loc_name]$cell_ID + subset_expr <- expr_values[, colnames(expr_values) %in% loc_cell_IDs] + if (is.vector(subset_expr) == TRUE) { + loc_av_expr <- subset_expr + } else { + loc_av_expr <- rowMeans(subset_expr) + } + loc_av_expr_list[[loc_name]] <- loc_av_expr + } + loc_av_expr_matrix <- do.call("cbind", loc_av_expr_list) + + # START TEST + loc_av_expr_matrix <- as.matrix(loc_av_expr_matrix) + # STOP + + # perform pca on grid matrix + mypca <- FactoMineR::PCA( + X = t(loc_av_expr_matrix), + scale.unit = scale_unit, + ncp = ncp, + graph = FALSE) + + # screeplot + screeplot <- factoextra::fviz_eig(mypca, addlabels = TRUE, ylim = c(0, 50)) + if (show_plot == TRUE) { + print(screeplot) + } + + # select variable PCs + eig.val <- factoextra::get_eigenvalue(mypca) + eig.val_DT <- data.table::as.data.table(eig.val) + eig.val_DT$names <- rownames(eig.val) + eig.val_DT[, zscore := scale(variance.percent)] + eig.val_DT[, rank := rank(variance.percent)] + dims_to_keep <- eig.val_DT[zscore > PC_zscore]$names + + + # if no dimensions are kept, return message + if (is.null(dims_to_keep) | length(dims_to_keep) < 1) { + message("no PC dimensions retained, lower the PC zscore") + } + + # coordinates for cells + pca_matrix <- mypca$ind$coord + if (length(dims_to_keep) == 1) { + pca_matrix_DT <- data.table::data.table( + "dimkeep" = pca_matrix[, 1], + loc_ID = colnames(loc_av_expr_matrix) + ) + data.table::setnames(pca_matrix_DT, old = "dimkeep", dims_to_keep) } else { - loc_av_expr = rowMeans(subset_expr) + pca_matrix_DT <- data.table::as.data.table(pca_matrix[ + , seq_along(dims_to_keep)]) + pca_matrix_DT[, loc_ID := colnames(loc_av_expr_matrix)] } - loc_av_expr_list[[loc_name]] <- loc_av_expr - } - loc_av_expr_matrix = do.call('cbind', loc_av_expr_list) - - # START TEST - loc_av_expr_matrix = as.matrix(loc_av_expr_matrix) - # STOP - - # perform pca on grid matrix - mypca <- FactoMineR::PCA(X = t(loc_av_expr_matrix), scale.unit = scale_unit, ncp = ncp, graph = F) - - # screeplot - screeplot = factoextra::fviz_eig(mypca, addlabels = T, ylim = c(0, 50)) - if(show_plot == TRUE) { - print(screeplot) - } - - # select variable PCs - eig.val <- factoextra::get_eigenvalue(mypca) - eig.val_DT <- data.table::as.data.table(eig.val) - eig.val_DT$names = rownames(eig.val) - eig.val_DT[, zscore := scale(variance.percent)] - eig.val_DT[, rank := rank(variance.percent)] - dims_to_keep = eig.val_DT[zscore > PC_zscore]$names - - - # if no dimensions are kept, return message - if(is.null(dims_to_keep) | length(dims_to_keep) < 1) { - return(cat('\n no PC dimensions retained, lower the PC zscore \n')) - } - - # coordinates for cells - pca_matrix <- mypca$ind$coord - if(length(dims_to_keep) == 1) { - pca_matrix_DT = data.table::data.table('dimkeep' = pca_matrix[,1], - loc_ID = colnames(loc_av_expr_matrix)) - data.table::setnames(pca_matrix_DT, old = 'dimkeep', dims_to_keep) - } else { - pca_matrix_DT <- data.table::as.data.table(pca_matrix[,seq_along(dims_to_keep)]) - pca_matrix_DT[, loc_ID := colnames(loc_av_expr_matrix)] - } - - - # correlation of genes with PCs - feat_matrix <- mypca$var$cor - if(length(dims_to_keep) == 1) { - feat_matrix_DT = data.table::data.table('featkeep' = feat_matrix[,1], - gene_ID = rownames(loc_av_expr_matrix)) - data.table::setnames(feat_matrix_DT, old = 'featkeep', dims_to_keep) - } else { - feat_matrix_DT <- data.table::as.data.table(feat_matrix[,seq_along(dims_to_keep)]) - feat_matrix_DT[, gene_ID := rownames(loc_av_expr_matrix)] - } - - - spatPatObject = list(pca_matrix_DT = pca_matrix_DT, - feat_matrix_DT = feat_matrix_DT, - spatial_grid = spatial_grid) - - class(spatPatObject) <- append('spatPatObj', class(spatPatObject)) - - return(spatPatObject) + + + # correlation of genes with PCs + feat_matrix <- mypca$var$cor + if (length(dims_to_keep) == 1) { + feat_matrix_DT <- data.table::data.table( + "featkeep" = feat_matrix[, 1], + gene_ID = rownames(loc_av_expr_matrix) + ) + data.table::setnames(feat_matrix_DT, old = "featkeep", dims_to_keep) + } else { + feat_matrix_DT <- data.table::as.data.table(feat_matrix[ + , seq_along(dims_to_keep)]) + feat_matrix_DT[, gene_ID := rownames(loc_av_expr_matrix)] + } + + + spatPatObject <- list( + pca_matrix_DT = pca_matrix_DT, + feat_matrix_DT = feat_matrix_DT, + spatial_grid = spatial_grid + ) + + class(spatPatObject) <- append("spatPatObj", class(spatPatObject)) + + return(spatPatObject) } @@ -2663,76 +2834,86 @@ detectSpatialPatterns <- function(gobject, #' @param show_plot show plot #' @param return_plot return ggplot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @return ggplot #' @export showPattern2D <- function(gobject, - spatPatObj, - dimension = 1, - trim = c(0.02, 0.98), - background_color = 'white', - grid_border_color = 'grey', - show_legend = T, - point_size = 1, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'showPattern2D') { - - if(!'spatPatObj' %in% class(spatPatObj)) { - stop('\n spatPatObj needs to be the output from detectSpatialPatterns \n') - } - - # select PC and subset data - selected_PC = paste0('Dim.', dimension) - PC_DT = spatPatObj$pca_matrix_DT - if(!selected_PC %in% colnames(PC_DT)) { - stop('\n This dimension was not found in the spatial pattern object \n') - } - PC_DT = PC_DT[,c(selected_PC, 'loc_ID'), with = F] - - # annotate grid with PC values - annotated_grid = merge(spatPatObj$spatial_grid, by.x = 'gr_name', PC_DT, by.y = 'loc_ID') - - # trim PC values - if(!is.null(trim)) { - boundaries = stats::quantile(annotated_grid[[selected_PC]], probs = trim) - annotated_grid[[selected_PC]][annotated_grid[[selected_PC]] < boundaries[1]] = boundaries[1] - annotated_grid[[selected_PC]][annotated_grid[[selected_PC]] > boundaries[2]] = boundaries[2] - - } - - # 2D-plot - # - - - dpl <- ggplot2::ggplot() - dpl <- dpl + ggplot2::theme_bw() - dpl <- dpl + ggplot2::geom_tile(data = annotated_grid, - aes_string(x = 'x_start', y = 'y_start', fill = selected_PC), - color = grid_border_color, show.legend = show_legend) - dpl <- dpl + ggplot2::scale_fill_gradient2('low' = 'darkblue', mid = 'white', high = 'darkred', midpoint = 0, - guide = guide_legend(title = '')) - dpl <- dpl + ggplot2::theme(axis.text.x = element_text(size = 8, angle = 45, vjust = 1, hjust = 1), - panel.background = element_rect(fill = background_color), - panel.grid = element_blank(), - plot.title = element_text(hjust = 0.5)) - dpl <- dpl + ggplot2::labs(x = 'x coordinates', y = 'y coordinates') - - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = dpl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + spatPatObj, + dimension = 1, + trim = c(0.02, 0.98), + background_color = "white", + grid_border_color = "grey", + show_legend = TRUE, + point_size = 1, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "showPattern2D") { + if (!"spatPatObj" %in% class(spatPatObj)) { + stop("spatPatObj needs to be the output from detectSpatialPatterns") + } + + # select PC and subset data + selected_PC <- paste0("Dim.", dimension) + PC_DT <- spatPatObj$pca_matrix_DT + if (!selected_PC %in% colnames(PC_DT)) { + stop("This dimension was not found in the spatial pattern object") + } + PC_DT <- PC_DT[, c(selected_PC, "loc_ID"), with = FALSE] + + # annotate grid with PC values + annotated_grid <- merge( + spatPatObj$spatial_grid, by.x = "gr_name", PC_DT, by.y = "loc_ID") + + # trim PC values + if (!is.null(trim)) { + boundaries <- stats::quantile(annotated_grid[[ + selected_PC]], probs = trim) + annotated_grid[[selected_PC]][annotated_grid[[ + selected_PC]] < boundaries[1]] <- boundaries[1] + annotated_grid[[selected_PC]][annotated_grid[[ + selected_PC]] > boundaries[2]] <- boundaries[2] + } + + # 2D-plot + # + + + dpl <- ggplot2::ggplot() + dpl <- dpl + ggplot2::theme_bw() + dpl <- dpl + ggplot2::geom_tile( + data = annotated_grid, + aes_string(x = "x_start", y = "y_start", fill = selected_PC), + color = grid_border_color, show.legend = show_legend + ) + dpl <- dpl + ggplot2::scale_fill_gradient2( + "low" = "darkblue", mid = "white", high = "darkred", midpoint = 0, + guide = guide_legend(title = "") + ) + dpl <- dpl + ggplot2::theme( + axis.text.x = element_text(size = 8, angle = 45, vjust = 1, hjust = 1), + panel.background = element_rect(fill = background_color), + panel.grid = element_blank(), + plot.title = element_text(hjust = 0.5) + ) + dpl <- dpl + ggplot2::labs(x = "x coordinates", y = "y coordinates") + + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = dpl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title showPattern @@ -2744,10 +2925,8 @@ showPattern2D <- function(gobject, #' @return ggplot #' @seealso \code{\link{showPattern2D}} #' @export -showPattern = function(gobject, spatPatObj, ...) { - - showPattern2D(gobject = gobject, spatPatObj = spatPatObj, ...) - +showPattern <- function(gobject, spatPatObj, ...) { + showPattern2D(gobject = gobject, spatPatObj = spatPatObj, ...) } #' @title showPattern3D @@ -2769,92 +2948,105 @@ showPattern = function(gobject, spatPatObj, ...) { #' @param show_plot show plot #' @param return_plot return plot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @return plotly #' @export showPattern3D <- function(gobject, - spatPatObj, - dimension = 1, - trim = c(0.02, 0.98), - background_color = 'white', - grid_border_color = 'grey', - show_legend = T, - point_size = 1, - axis_scale = c("cube","real","custom"), - custom_ratio = NULL, - x_ticks = NULL, - y_ticks = NULL, - z_ticks = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'showPattern3D') { - - # data.table variables - center_x = x_start = x_end = center_y = y_start = y_end = center_z = z_start = z_end = NULL - - if(!'spatPatObj' %in% class(spatPatObj)) { - stop('\n spatPatObj needs to be the output from detectSpatialPatterns \n') - } - - # select PC and subset data - selected_PC = paste0('Dim.', dimension) - PC_DT = spatPatObj$pca_matrix_DT - if(!selected_PC %in% colnames(PC_DT)) { - stop('\n This dimension was not found in the spatial pattern object \n') - } - PC_DT = PC_DT[,c(selected_PC, 'loc_ID'), with = F] - - # annotate grid with PC values - annotated_grid = merge(spatPatObj$spatial_grid, by.x = 'gr_name', PC_DT, by.y = 'loc_ID') - - # trim PC values - if(!is.null(trim)) { - boundaries = stats::quantile(annotated_grid[[selected_PC]], probs = trim) - annotated_grid[[selected_PC]][annotated_grid[[selected_PC]] < boundaries[1]] = boundaries[1] - annotated_grid[[selected_PC]][annotated_grid[[selected_PC]] > boundaries[2]] = boundaries[2] - - } - - - annotated_grid <- data.table(annotated_grid) - annotated_grid[,center_x:=(x_start+x_end)/2] - annotated_grid[,center_y:=(y_start+y_end)/2] - annotated_grid[,center_z:=(z_start+z_end)/2] - - - axis_scale = match.arg(axis_scale, c("cube","real","custom")) - - ratio = plotly_axis_scale_3D(annotated_grid,sdimx = "center_x",sdimy = "center_y",sdimz = "center_z", - mode = axis_scale,custom_ratio = custom_ratio) - - dpl <- plotly::plot_ly(type = 'scatter3d', - x = annotated_grid$center_x, y = annotated_grid$center_y, z = annotated_grid$center_z, - color = annotated_grid[[selected_PC]],marker = list(size = point_size), - mode = 'markers', colors = c( 'darkblue','white','darkred')) - dpl <- dpl %>% plotly::layout(scene = list( - xaxis = list(title = "X",nticks = x_ticks), - yaxis = list(title = "Y",nticks = y_ticks), - zaxis = list(title = "Z",nticks = z_ticks), - aspectmode='manual', - aspectratio = list(x=ratio[[1]], - y=ratio[[2]], - z=ratio[[3]]))) - dpl <- dpl %>% plotly::colorbar(title = paste(paste("dim.",dimension,sep = ""),"genes", sep = " ")) - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = dpl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + spatPatObj, + dimension = 1, + trim = c(0.02, 0.98), + background_color = "white", + grid_border_color = "grey", + show_legend = TRUE, + point_size = 1, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + x_ticks = NULL, + y_ticks = NULL, + z_ticks = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "showPattern3D") { + # data.table variables + center_x <- x_start <- x_end <- center_y <- y_start <- y_end <- + center_z <- z_start <- z_end <- NULL + + if (!"spatPatObj" %in% class(spatPatObj)) { + stop("spatPatObj needs to be the output from detectSpatialPatterns") + } + + # select PC and subset data + selected_PC <- paste0("Dim.", dimension) + PC_DT <- spatPatObj$pca_matrix_DT + if (!selected_PC %in% colnames(PC_DT)) { + stop("This dimension was not found in the spatial pattern object") + } + PC_DT <- PC_DT[, c(selected_PC, "loc_ID"), with = FALSE] + + # annotate grid with PC values + annotated_grid <- merge( + spatPatObj$spatial_grid, by.x = "gr_name", PC_DT, by.y = "loc_ID") + + # trim PC values + if (!is.null(trim)) { + boundaries <- stats::quantile(annotated_grid[[ + selected_PC]], probs = trim) + annotated_grid[[selected_PC]][annotated_grid[[ + selected_PC]] < boundaries[1]] <- boundaries[1] + annotated_grid[[selected_PC]][annotated_grid[[ + selected_PC]] > boundaries[2]] <- boundaries[2] + } + + + annotated_grid <- data.table(annotated_grid) + annotated_grid[, center_x := (x_start + x_end) / 2] + annotated_grid[, center_y := (y_start + y_end) / 2] + annotated_grid[, center_z := (z_start + z_end) / 2] + + + axis_scale <- match.arg(axis_scale, c("cube", "real", "custom")) + + ratio <- plotly_axis_scale_3D(annotated_grid, + sdimx = "center_x", sdimy = "center_y", sdimz = "center_z", + mode = axis_scale, custom_ratio = custom_ratio + ) + + dpl <- plotly::plot_ly( + type = "scatter3d", + x = annotated_grid$center_x, y = annotated_grid$center_y, z = annotated_grid$center_z, + color = annotated_grid[[selected_PC]], marker = list(size = point_size), + mode = "markers", colors = c("darkblue", "white", "darkred") + ) + dpl <- dpl %>% plotly::layout(scene = list( + xaxis = list(title = "X", nticks = x_ticks), + yaxis = list(title = "Y", nticks = y_ticks), + zaxis = list(title = "Z", nticks = z_ticks), + aspectmode = "manual", + aspectratio = list( + x = ratio[[1]], + y = ratio[[2]], + z = ratio[[3]] + ) + )) + dpl <- dpl %>% plotly::colorbar( + title = paste(paste("dim.", dimension, sep = ""), "genes", sep = " ")) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = dpl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -2869,73 +3061,80 @@ showPattern3D <- function(gobject, #' @param top_pos_genes Top positively correlated genes. #' @param top_neg_genes Top negatively correlated genes. #' @param point_size size of points -#' @param return_DT if TRUE, it will return the data.table used to generate the plots +#' @param return_DT if TRUE, it will return the data.table used to generate +#' the plots #' @param show_plot show plot #' @param return_plot return ggplot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param #' @return ggplot #' @export showPatternGenes <- function(gobject, - spatPatObj, - dimension = 1, - top_pos_genes = 5, - top_neg_genes = 5, - point_size = 1, - return_DT = FALSE, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'showPatternGenes') { - - # data.table variables - gene_ID = NULL - - if(!'spatPatObj' %in% class(spatPatObj)) { - stop('\n spatPatObj needs to be the output from detectSpatialPatterns \n') - } - - - # select PC to use - selected_PC = paste0('Dim.', dimension) - - gene_cor_DT = spatPatObj$feat_matrix_DT - if(!selected_PC %in% colnames(gene_cor_DT)) { - stop('\n This dimension was not found in the spatial pattern object \n') - } - gene_cor_DT = gene_cor_DT[,c(selected_PC, 'gene_ID'), with = F] - - # order and subset - gene_cor_DT = gene_cor_DT[!is.na(get(selected_PC))][order(get(selected_PC))] - - subset = gene_cor_DT[c(1:top_neg_genes, (nrow(gene_cor_DT)-top_pos_genes):nrow(gene_cor_DT))] - subset[, gene_ID := factor(gene_ID, gene_ID)] - - ## return DT and make not plot ## - if(return_DT == TRUE) { - return(subset) - } - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - pl <- pl + ggplot2::geom_point(data = subset, aes_string(x = selected_PC, y = 'gene_ID'), size = point_size) - pl <- pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) - pl <- pl + ggplot2::labs(x = 'correlation', y = '', title = selected_PC) - pl <- pl + ggplot2::theme(plot.title = element_text(hjust = 0.5)) - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + spatPatObj, + dimension = 1, + top_pos_genes = 5, + top_neg_genes = 5, + point_size = 1, + return_DT = FALSE, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "showPatternGenes") { + # data.table variables + gene_ID <- NULL + + if (!"spatPatObj" %in% class(spatPatObj)) { + stop("spatPatObj needs to be the output from detectSpatialPatterns") + } + + + # select PC to use + selected_PC <- paste0("Dim.", dimension) + + gene_cor_DT <- spatPatObj$feat_matrix_DT + if (!selected_PC %in% colnames(gene_cor_DT)) { + stop("This dimension was not found in the spatial pattern object") + } + gene_cor_DT <- gene_cor_DT[, c(selected_PC, "gene_ID"), with = FALSE] + + # order and subset + gene_cor_DT <- gene_cor_DT[ + !is.na(get(selected_PC))][order(get(selected_PC))] + + subset <- gene_cor_DT[ + c(1:top_neg_genes, (nrow( + gene_cor_DT) - top_pos_genes):nrow(gene_cor_DT))] + subset[, gene_ID := factor(gene_ID, gene_ID)] + + ## return DT and make not plot ## + if (return_DT == TRUE) { + return(subset) + } + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_point( + data = subset, + aes_string(x = selected_PC, y = "gene_ID"), size = point_size) + pl <- pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) + pl <- pl + ggplot2::labs(x = "correlation", y = "", title = selected_PC) + pl <- pl + ggplot2::theme(plot.title = element_text(hjust = 0.5)) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -2948,63 +3147,66 @@ showPatternGenes <- function(gobject, #' @param top_neg_genes Top negatively correlated genes. #' @param min_pos_cor Minimum positive correlation score to include a gene. #' @param min_neg_cor Minimum negative correlation score to include a gene. -#' @param return_top_selection only return selection based on correlation criteria (boolean) +#' @param return_top_selection only return selection based on correlation +#' criteria (boolean) #' @return Data.table with genes associated with selected dimension (PC). #' @details Description. #' @export selectPatternGenes <- function(spatPatObj, - dimensions = 1:5, - top_pos_genes = 10, - top_neg_genes = 10, - min_pos_cor = 0.5, - min_neg_cor = -0.5, - return_top_selection = FALSE) { - - - if(!'spatPatObj' %in% class(spatPatObj)) { - stop('\n spatPatObj needs to be the output from detectSpatialPatterns \n') - } - - # data.table variables - top_pos_rank = value = top_neg_rank = topvalue = gene_ID = variable = NULL - - - # select PC to use - selected_PCs = paste0('Dim.', dimensions) - gene_cor_DT = spatPatObj$feat_matrix_DT - if(any(selected_PCs %in% colnames(gene_cor_DT) == F)) { - stop('\n not all dimensions were found back \n') - } - gene_cor_DT = gene_cor_DT[,c(selected_PCs, 'gene_ID'), with = FALSE] - - # melt and select - gene_cor_DT_m = data.table::melt.data.table(gene_cor_DT, id.vars = 'gene_ID') - gene_cor_DT_m[, top_pos_rank := rank(value), by = 'variable'] - gene_cor_DT_m[, top_neg_rank := rank(-value), by = 'variable'] - selection = gene_cor_DT_m[top_pos_rank %in% 1:top_pos_genes | top_neg_rank %in% 1:top_neg_genes] + dimensions = 1:5, + top_pos_genes = 10, + top_neg_genes = 10, + min_pos_cor = 0.5, + min_neg_cor = -0.5, + return_top_selection = FALSE) { + if (!"spatPatObj" %in% class(spatPatObj)) { + stop("spatPatObj needs to be the output from detectSpatialPatterns") + } - # filter on min correlation - selection = selection[value > min_pos_cor | value < min_neg_cor] + # data.table variables + top_pos_rank <- value <- top_neg_rank <- topvalue <- gene_ID <- + variable <- NULL - # return all the top correlated genes + information - if(return_top_selection == TRUE) { - return(selection) - } - # remove duplicated genes by only retaining the most correlated dimension - selection[, topvalue := max(abs(value)), by = 'gene_ID'] - uniq_selection = selection[value == topvalue] + # select PC to use + selected_PCs <- paste0("Dim.", dimensions) + gene_cor_DT <- spatPatObj$feat_matrix_DT + if (any(selected_PCs %in% colnames(gene_cor_DT) == FALSE)) { + stop("not all dimensions were found back") + } + gene_cor_DT <- gene_cor_DT[, c(selected_PCs, "gene_ID"), with = FALSE] + + # melt and select + gene_cor_DT_m <- data.table::melt.data.table( + gene_cor_DT, id.vars = "gene_ID") + gene_cor_DT_m[, top_pos_rank := rank(value), by = "variable"] + gene_cor_DT_m[, top_neg_rank := rank(-value), by = "variable"] + selection <- gene_cor_DT_m[ + top_pos_rank %in% 1:top_pos_genes | top_neg_rank %in% 1:top_neg_genes] + + # filter on min correlation + selection <- selection[value > min_pos_cor | value < min_neg_cor] + + # return all the top correlated genes + information + if (return_top_selection == TRUE) { + return(selection) + } - # add other genes back - output_selection = uniq_selection[,.(gene_ID, variable)] - other_genes = gene_cor_DT[!gene_ID %in% output_selection$gene_ID][['gene_ID']] - other_genes_DT = data.table::data.table(gene_ID = other_genes, variable = 'noDim') + # remove duplicated genes by only retaining the most correlated dimension + selection[, topvalue := max(abs(value)), by = "gene_ID"] + uniq_selection <- selection[value == topvalue] - comb_output_genes = rbind(output_selection, other_genes_DT) - setnames(comb_output_genes, 'variable', 'patDim') + # add other genes back + output_selection <- uniq_selection[, .(gene_ID, variable)] + other_genes <- gene_cor_DT[!gene_ID %in% output_selection$gene_ID][[ + "gene_ID"]] + other_genes_DT <- data.table::data.table( + gene_ID = other_genes, variable = "noDim") - return(comb_output_genes) + comb_output_genes <- rbind(output_selection, other_genes_DT) + setnames(comb_output_genes, "variable", "patDim") + return(comb_output_genes) } @@ -3025,83 +3227,98 @@ selectPatternGenes <- function(spatPatObj, #' @param subset_feats subset of features to use #' @param spatial_network_name name of spatial network to use #' @param b smoothing factor beteen 0 and 1 (default: automatic) -#' @return matrix with smoothened gene expression values based on kNN spatial network -#' @details This function will smoothen the gene expression values per cell according to -#' its neighbors in the selected spatial network. \cr -#' b is a smoothening factor that defaults to 1 - 1/k, where k is the median number of -#' k-neighbors in the selected spatial network. Setting b = 0 means no smoothing and b = 1 -#' means no contribution from its own expression. +#' @return matrix with smoothened gene expression values based on kNN +#' spatial network +#' @details This function will smoothen the gene expression values per cell +#' according to its neighbors in the selected spatial network. \cr +#' b is a smoothening factor that defaults to 1 - 1/k, where k is the median +#' number of k-neighbors in the selected spatial network. Setting b = 0 means +#' no smoothing and b = 1 means no contribution from its own expression. #' @keywords internal -do_spatial_knn_smoothing = function(expression_matrix, - spatial_network, - subset_feats = NULL, - b = NULL) { - - # checks - if(!is.null(b)) { - if(b > 1 | b < 0) { - stop('b needs to be between 0 (no spatial contribution) and 1 (only spatial contribution)') +do_spatial_knn_smoothing <- function(expression_matrix, + spatial_network, + subset_feats = NULL, + b = NULL) { + # checks + if (!is.null(b)) { + if (b > 1 | b < 0) { + stop("b needs to be between 0 (no spatial contribution) and 1 + (only spatial contribution)") + } } - } - - # check if spatial network and expression matrix are compatible - compatible_spatial_network(spatial_network = spatial_network, - expression_matrix = expression_matrix) - - # matrix - expr_values = expression_matrix - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats,] - } - - # data.table variables - feat_ID = value = NULL - - # merge spatial network with expression data - expr_values_dt = data.table::as.data.table(as.matrix(expr_values)); expr_values_dt[, feat_ID := rownames(expr_values)] - expr_values_dt_m = data.table::melt.data.table(expr_values_dt, id.vars = 'feat_ID', variable.name = 'cell_ID') + # check if spatial network and expression matrix are compatible + compatible_spatial_network( + spatial_network = spatial_network, + expression_matrix = expression_matrix + ) - # merge spatial network and matrix - spatial_network_ext = data.table::merge.data.table(spatial_network, expr_values_dt_m, - by.x = 'from', by.y = 'cell_ID', allow.cartesian = T) + # matrix + expr_values <- expression_matrix + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] + } - # calculate mean over all k-neighbours - # exclude 0's? - # trimmed mean? - spatial_network_ext_smooth = spatial_network_ext[, mean(value), by = c('to', 'feat_ID')] + # data.table variables + feat_ID <- value <- NULL - # convert back to matrix - spatial_smooth_dc = data.table::dcast.data.table(data = spatial_network_ext_smooth, formula = feat_ID~to, value.var = 'V1') - spatial_smooth_matrix = dt_to_matrix(spatial_smooth_dc) + # merge spatial network with expression data + expr_values_dt <- data.table::as.data.table(as.matrix(expr_values)) + expr_values_dt[, feat_ID := rownames(expr_values)] + expr_values_dt_m <- data.table::melt.data.table( + expr_values_dt, id.vars = "feat_ID", variable.name = "cell_ID") - # if network was not fully connected, some cells might be missing and are not smoothed - # add the original values for those cells back - all_cells = colnames(expr_values) - smoothed_cells = colnames(spatial_smooth_matrix) - missing_cells = all_cells[!all_cells %in% smoothed_cells] - if(length(missing_cells) > 0) { - missing_matrix = expr_values[, missing_cells] - spatial_smooth_matrix = cbind(spatial_smooth_matrix[rownames(expr_values),], missing_matrix) - } + # merge spatial network and matrix + spatial_network_ext <- data.table::merge.data.table( + spatial_network, expr_values_dt_m, + by.x = "from", by.y = "cell_ID", + allow.cartesian = TURE + ) - spatial_smooth_matrix = spatial_smooth_matrix[rownames(expr_values), colnames(expr_values)] + # calculate mean over all k-neighbours + # exclude 0's? + # trimmed mean? + spatial_network_ext_smooth <- spatial_network_ext[ + , mean(value), by = c("to", "feat_ID")] + + # convert back to matrix + spatial_smooth_dc <- data.table::dcast.data.table( + data = spatial_network_ext_smooth, + formula = feat_ID ~ to, + value.var = "V1") + spatial_smooth_matrix <- dt_to_matrix(spatial_smooth_dc) + + # if network was not fully connected, some cells might be missing and + # are not smoothed + # add the original values for those cells back + all_cells <- colnames(expr_values) + smoothed_cells <- colnames(spatial_smooth_matrix) + missing_cells <- all_cells[!all_cells %in% smoothed_cells] + + if (length(missing_cells) > 0) { + missing_matrix <- expr_values[, missing_cells] + spatial_smooth_matrix <- cbind(spatial_smooth_matrix[ + rownames(expr_values), ], missing_matrix) + } - # combine original and smoothed values according to smoothening b - # create best guess for b if not given - if(is.null(b)) { - k = stats::median(table(spatial_network$to)) - smooth_b = 1 - 1/k - } else { - smooth_b = b - } + spatial_smooth_matrix <- spatial_smooth_matrix[ + rownames(expr_values), colnames(expr_values)] - expr_b = 1 - smooth_b - spatsmooth_expr_values = ((smooth_b*spatial_smooth_matrix) + (expr_b*expr_values)) + # combine original and smoothed values according to smoothening b + # create best guess for b if not given + if (is.null(b)) { + k <- stats::median(table(spatial_network$to)) + smooth_b <- 1 - 1 / k + } else { + smooth_b <- b + } - return(spatsmooth_expr_values) + expr_b <- 1 - smooth_b + spatsmooth_expr_values <- ((smooth_b * spatial_smooth_matrix) + ( + expr_b * expr_values)) + return(spatsmooth_expr_values) } @@ -3110,22 +3327,20 @@ do_spatial_knn_smoothing = function(expression_matrix, #' @title Evaluate provided spatial locations #' @name evaluate_provided_spatial_locations #' @keywords internal -evaluate_provided_spatial_locations = function(spatial_locs) { - - if(!inherits(spatial_locs, 'data.frame')) { - stop('The spatial locations must be a data.frame(-like) object \n') - } - - locs_names = colnames(spatial_locs) - required_cols = c('sdimx', 'sdimy', 'cell_ID') - missing_cols = required_cols[!required_cols %in% locs_names] +evaluate_provided_spatial_locations <- function(spatial_locs) { + if (!inherits(spatial_locs, "data.frame")) { + stop("The spatial locations must be a data.frame(-like) object") + } - if(length(missing_cols) > 0) { - stop('missing columns: ', list(missing_cols)) - } else { - return(TRUE) - } + locs_names <- colnames(spatial_locs) + required_cols <- c("sdimx", "sdimy", "cell_ID") + missing_cols <- required_cols[!required_cols %in% locs_names] + if (length(missing_cols) > 0) { + stop("missing columns: ", list(missing_cols)) + } else { + return(TRUE) + } } @@ -3135,60 +3350,59 @@ evaluate_provided_spatial_locations = function(spatial_locs) { #' @description smooth gene expression over a defined spatial grid #' @return matrix with smoothened gene expression values based on spatial grid #' @keywords internal -do_spatial_grid_averaging = function(expression_matrix, - spatial_grid, - spatial_locs, - subset_feats = NULL, - min_cells_per_grid = 4) { - - - # matrix - expr_values = expression_matrix - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats,] - } - - # check spatial grid - if(!inherits(x = spatial_grid, what = "spatialGridObj")) { - stop('spatial_grid needs to be spatialGridObj') - } - - # check spatial locations - evaluate_provided_spatial_locations(spatial_locs = spatial_locs) - - # annoate spatial locations with spatial grid - if(all(c('sdimx', 'sdimy', 'sdimz') %in% colnames(spatial_locs))) { - spatial_locs = annotate_spatlocs_with_spatgrid_3D(spatloc = spatial_locs, spatgrid = spatial_grid) - } else if(all(c('sdimx', 'sdimy') %in% colnames(spatial_locs))) { - spatial_locs = annotate_spatlocs_with_spatgrid_2D(spatloc = spatial_locs, spatgrid = spatial_grid) - } - - - # data.table variables - gr_loc = NULL - - # filter grid, minimum number of cells per grid - cells_per_grid = sort(table(spatial_locs$gr_loc)) - cells_per_grid = cells_per_grid[cells_per_grid >= min_cells_per_grid] - loc_names = names(cells_per_grid) - - # average expression per grid - loc_av_expr_list <- list() - for(loc_name in loc_names) { - - loc_cell_IDs = spatial_locs[gr_loc == loc_name]$cell_ID - subset_expr = expr_values[, colnames(expr_values) %in% loc_cell_IDs] - if(is.vector(subset_expr) == TRUE) { - loc_av_expr = subset_expr - } else { - loc_av_expr = rowMeans(subset_expr) +do_spatial_grid_averaging <- function(expression_matrix, + spatial_grid, + spatial_locs, + subset_feats = NULL, + min_cells_per_grid = 4) { + # matrix + expr_values <- expression_matrix + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] } - loc_av_expr_list[[loc_name]] <- loc_av_expr - } - loc_av_expr_matrix = do.call('cbind', loc_av_expr_list) - loc_av_expr_matrix = as.matrix(loc_av_expr_matrix) - return(loc_av_expr_matrix) + # check spatial grid + if (!inherits(x = spatial_grid, what = "spatialGridObj")) { + stop("spatial_grid needs to be spatialGridObj") + } + + # check spatial locations + evaluate_provided_spatial_locations(spatial_locs = spatial_locs) + + # annoate spatial locations with spatial grid + if (all(c("sdimx", "sdimy", "sdimz") %in% colnames(spatial_locs))) { + spatial_locs <- annotate_spatlocs_with_spatgrid_3D( + spatloc = spatial_locs, spatgrid = spatial_grid) + } else if (all(c("sdimx", "sdimy") %in% colnames(spatial_locs))) { + spatial_locs <- annotate_spatlocs_with_spatgrid_2D( + spatloc = spatial_locs, spatgrid = spatial_grid) + } + + + # data.table variables + gr_loc <- NULL + + # filter grid, minimum number of cells per grid + cells_per_grid <- sort(table(spatial_locs$gr_loc)) + cells_per_grid <- cells_per_grid[cells_per_grid >= min_cells_per_grid] + loc_names <- names(cells_per_grid) + + # average expression per grid + loc_av_expr_list <- list() + for (loc_name in loc_names) { + loc_cell_IDs <- spatial_locs[gr_loc == loc_name]$cell_ID + subset_expr <- expr_values[, colnames(expr_values) %in% loc_cell_IDs] + if (is.vector(subset_expr) == TRUE) { + loc_av_expr <- subset_expr + } else { + loc_av_expr <- rowMeans(subset_expr) + } + loc_av_expr_list[[loc_name]] <- loc_av_expr + } + loc_av_expr_matrix <- do.call("cbind", loc_av_expr_list) + loc_av_expr_matrix <- as.matrix(loc_av_expr_matrix) + + return(loc_av_expr_matrix) } @@ -3202,13 +3416,16 @@ do_spatial_grid_averaging = function(expression_matrix, #' @param spatial_grid provided spatial grid #' @param spatial_locs provided spatial locations #' @param subset_feats subset of features to use -#' @param network_smoothing smoothing factor beteen 0 and 1 (default: automatic) +#' @param network_smoothing smoothing factor beteen 0 and 1 +#' (default: automatic) #' @param min_cells_per_grid minimum number of cells to consider a grid #' @param cor_method correlation method #' @return returns a spatial correlation object: "spatCorObject" #' @details -#' For method = network, it expects a fully connected spatial network. You can make sure to create a -#' fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +#' For method = network, it expects a fully connected spatial network. +#' You can make sure to create a +#' fully connected network by setting minimal_k > 0 in the +#' \code{\link{createSpatialNetwork}} function. #' \itemize{ #' \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} #' \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell @@ -3221,103 +3438,114 @@ do_spatial_grid_averaging = function(expression_matrix, #' @seealso \code{\link{showSpatialCorFeats}} #' @export detectSpatialCorFeatsMatrix <- function(expression_matrix, - method = c('grid', 'network'), - spatial_network, - spatial_grid, - spatial_locs, - subset_feats = NULL, - network_smoothing = NULL, - min_cells_per_grid = 4, - cor_method = c('pearson', 'kendall', 'spearman')) { - - - ## correlation method to be used - cor_method = match.arg(cor_method, choices = c('pearson', 'kendall', 'spearman')) - - ## method to be used - method = match.arg(method, choices = c('grid', 'network')) - - ## spatial averaging or smoothing - if(method == 'grid') { - - loc_av_expr_matrix = do_spatial_grid_averaging(expression_matrix = as.matrix(expression_matrix), - spatial_grid = spatial_grid, - spatial_locs = spatial_locs, - subset_feats = subset_feats, - min_cells_per_grid = min_cells_per_grid) - - # data.table variables - feat_ID = variable = NULL - - cor_spat_matrix = cor_flex(t_flex(as.matrix(loc_av_expr_matrix)), method = cor_method) - cor_spat_matrixDT = data.table::as.data.table(cor_spat_matrix) - cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] - cor_spat_DT = data.table::melt.data.table(data = cor_spat_matrixDT, - id.vars = 'feat_ID', value.name = 'spat_cor') - } - - if(method == 'network') { - - knn_av_expr_matrix = do_spatial_knn_smoothing(expression_matrix = as.matrix(expression_matrix), - spatial_network = spatial_network, - subset_feats = subset_feats, - b = network_smoothing) - - - - cor_spat_matrix = cor_flex(t_flex(as.matrix(knn_av_expr_matrix)), method = cor_method) - cor_spat_matrixDT = data.table::as.data.table(cor_spat_matrix) - cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] - cor_spat_DT = data.table::melt.data.table(data = cor_spat_matrixDT, - id.vars = 'feat_ID', value.name = 'spat_cor') - - - } - + method = c("grid", "network"), + spatial_network, + spatial_grid, + spatial_locs, + subset_feats = NULL, + network_smoothing = NULL, + min_cells_per_grid = 4, + cor_method = c("pearson", "kendall", "spearman")) { + ## correlation method to be used + cor_method <- match.arg( + cor_method, choices = c("pearson", "kendall", "spearman")) + + ## method to be used + method <- match.arg(method, choices = c("grid", "network")) + + ## spatial averaging or smoothing + if (method == "grid") { + loc_av_expr_matrix <- do_spatial_grid_averaging( + expression_matrix = as.matrix(expression_matrix), + spatial_grid = spatial_grid, + spatial_locs = spatial_locs, + subset_feats = subset_feats, + min_cells_per_grid = min_cells_per_grid + ) + + # data.table variables + feat_ID <- variable <- NULL + + cor_spat_matrix <- cor_flex(t_flex( + as.matrix(loc_av_expr_matrix)), method = cor_method) + cor_spat_matrixDT <- data.table::as.data.table(cor_spat_matrix) + cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] + cor_spat_DT <- data.table::melt.data.table( + data = cor_spat_matrixDT, + id.vars = "feat_ID", value.name = "spat_cor" + ) + } + if (method == "network") { + knn_av_expr_matrix <- do_spatial_knn_smoothing( + expression_matrix = as.matrix(expression_matrix), + spatial_network = spatial_network, + subset_feats = subset_feats, + b = network_smoothing + ) - # data.table variables - cordiff = spat_cor = expr_cor = spatrank= exprrank = rankdiff = NULL - ## 2. perform expression correlation at single-cell level without spatial information - # matrix - expr_values = expression_matrix - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats,] - } + cor_spat_matrix <- cor_flex(t_flex(as.matrix( + knn_av_expr_matrix)), method = cor_method) + cor_spat_matrixDT <- data.table::as.data.table(cor_spat_matrix) + cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] + cor_spat_DT <- data.table::melt.data.table( + data = cor_spat_matrixDT, + id.vars = "feat_ID", value.name = "spat_cor" + ) + } - cor_matrix = cor_flex(t_flex(expr_values), method = cor_method) - cor_matrixDT = data.table::as.data.table(cor_matrix) - cor_matrixDT[, feat_ID := rownames(cor_matrix)] - cor_DT = data.table::melt.data.table(data = cor_matrixDT, - id.vars = 'feat_ID', value.name = 'expr_cor') - ## 3. merge spatial and expression correlation - data.table::setorder(cor_spat_DT, feat_ID, variable) - data.table::setorder(cor_DT, feat_ID, variable) - doubleDT = cbind(cor_spat_DT, expr_cor = cor_DT[['expr_cor']]) - # difference in correlation scores - doubleDT[, cordiff := spat_cor - expr_cor] + # data.table variables + cordiff <- spat_cor <- expr_cor <- spatrank <- exprrank <- rankdiff <- NULL - # difference in rank scores - doubleDT[, spatrank := data.table::frank(-spat_cor, ties.method = 'first'), by = feat_ID] - doubleDT[, exprrank := data.table::frank(-expr_cor, ties.method = 'first'), by = feat_ID] - doubleDT[, rankdiff := spatrank - exprrank] + ## 2. perform expression correlation at single-cell level without + ## spatial information - # sort data - data.table::setorder(doubleDT, feat_ID, -spat_cor) + # matrix + expr_values <- expression_matrix + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] + } - spatCorObject = list(cor_DT = doubleDT, - feat_order = rownames(cor_spat_matrix), - cor_hclust = list(), - cor_clusters = list()) + cor_matrix <- cor_flex(t_flex(expr_values), method = cor_method) + cor_matrixDT <- data.table::as.data.table(cor_matrix) + cor_matrixDT[, feat_ID := rownames(cor_matrix)] + cor_DT <- data.table::melt.data.table( + data = cor_matrixDT, + id.vars = "feat_ID", value.name = "expr_cor" + ) - class(spatCorObject) = append(class(spatCorObject), 'spatCorObject') + ## 3. merge spatial and expression correlation + data.table::setorder(cor_spat_DT, feat_ID, variable) + data.table::setorder(cor_DT, feat_ID, variable) + doubleDT <- cbind(cor_spat_DT, expr_cor = cor_DT[["expr_cor"]]) + + # difference in correlation scores + doubleDT[, cordiff := spat_cor - expr_cor] + + # difference in rank scores + doubleDT[, spatrank := data.table::frank( + -spat_cor, ties.method = "first"), by = feat_ID] + doubleDT[, exprrank := data.table::frank( + -expr_cor, ties.method = "first"), by = feat_ID] + doubleDT[, rankdiff := spatrank - exprrank] + + # sort data + data.table::setorder(doubleDT, feat_ID, -spat_cor) + + spatCorObject <- list( + cor_DT = doubleDT, + feat_order = rownames(cor_spat_matrix), + cor_hclust = list(), + cor_clusters = list() + ) - return(spatCorObject) + class(spatCorObject) <- append(class(spatCorObject), "spatCorObject") + return(spatCorObject) } @@ -3333,14 +3561,17 @@ detectSpatialCorFeatsMatrix <- function(expression_matrix, #' @param expression_values gene expression values to use #' @param subset_feats subset of feats to use #' @param spatial_network_name name of spatial network to use -#' @param network_smoothing smoothing factor beteen 0 and 1 (default: automatic) +#' @param network_smoothing smoothing factor beteen 0 and 1 +#' (default: automatic) #' @param spatial_grid_name name of spatial grid to use #' @param min_cells_per_grid minimum number of cells to consider a grid #' @param cor_method correlation method #' @return returns a spatial correlation object: "spatCorObject" #' @details -#' For method = network, it expects a fully connected spatial network. You can make sure to create a -#' fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +#' For method = network, it expects a fully connected spatial network. You +#' can make sure to create a +#' fully connected network by setting minimal_k > 0 in the +#' \code{\link{createSpatialNetwork}} function. #' \itemize{ #' \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} #' \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell @@ -3353,140 +3584,165 @@ detectSpatialCorFeatsMatrix <- function(expression_matrix, #' @seealso \code{\link{showSpatialCorFeats}} #' @export detectSpatialCorFeats <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - spat_loc_name = 'raw', - method = c('grid', 'network'), - expression_values = c('normalized', 'scaled', 'custom'), - subset_feats = NULL, - spatial_network_name = 'Delaunay_network', - network_smoothing = NULL, - spatial_grid_name = 'spatial_grid', - min_cells_per_grid = 4, - cor_method = c('pearson', 'kendall', 'spearman')) { - - # set default spat_unit and feat_type - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## correlation method to be used - cor_method = match.arg(cor_method, choices = c('pearson', 'kendall', 'spearman')) - - ## method to be used - method = match.arg(method, choices = c('grid', 'network')) - - # get expression matrix - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - if(!is.null(subset_feats)) { - expr_values = expr_values[rownames(expr_values) %in% subset_feats,] - } - - - - # get spatial locations - spatial_locs = get_spatial_locations(gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name, - output = 'data.table', - copy_obj = TRUE) - - ## spatial averaging or smoothing - if(method == 'grid') { - - # get spatial grid - spatial_grid = get_spatialGrid(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - name = spatial_grid_name, - return_grid_Obj = FALSE) - - loc_av_expr_matrix = do_spatial_grid_averaging(expression_matrix = as.matrix(expr_values), - spatial_grid = spatial_grid, - spatial_locs = spatial_locs, - subset_feats = subset_feats, - min_cells_per_grid = min_cells_per_grid) - - # data.table variables - feat_ID = variable = NULL - - cor_spat_matrix = cor_flex(t_flex(as.matrix(loc_av_expr_matrix)), method = cor_method) - cor_spat_matrixDT = data.table::as.data.table(cor_spat_matrix) - cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] - cor_spat_DT = data.table::melt.data.table(data = cor_spat_matrixDT, - id.vars = 'feat_ID', value.name = 'spat_cor') - } - - if(method == 'network') { - - # get spatial network - spatial_network = get_spatialNetwork(gobject = gobject, - spat_unit = spat_unit, - name = spatial_network_name, - output = 'networkDT') - - knn_av_expr_matrix = do_spatial_knn_smoothing(expression_matrix = as.matrix(expr_values), - spatial_network = spatial_network, - subset_feats = subset_feats, - b = network_smoothing) - - - - - cor_spat_matrix = cor_flex(t_flex(as.matrix(knn_av_expr_matrix)), method = cor_method) - cor_spat_matrixDT = data.table::as.data.table(cor_spat_matrix) - cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] - cor_spat_DT = data.table::melt.data.table(data = cor_spat_matrixDT, - id.vars = 'feat_ID', value.name = 'spat_cor') - + spat_unit = NULL, + feat_type = NULL, + spat_loc_name = "raw", + method = c("grid", "network"), + expression_values = c("normalized", "scaled", "custom"), + subset_feats = NULL, + spatial_network_name = "Delaunay_network", + network_smoothing = NULL, + spatial_grid_name = "spatial_grid", + min_cells_per_grid = 4, + cor_method = c("pearson", "kendall", "spearman")) { + # set default spat_unit and feat_type + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } + ## correlation method to be used + cor_method <- match.arg( + cor_method, choices = c("pearson", "kendall", "spearman")) + + ## method to be used + method <- match.arg(method, choices = c("grid", "network")) + + # get expression matrix + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + if (!is.null(subset_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% subset_feats, ] + } - # data.table variables - cordiff = spat_cor = expr_cor = spatrank= exprrank = rankdiff = NULL - ## 2. perform expression correlation at single-cell level without spatial information - cor_matrix = cor_flex(t_flex(expr_values), method = cor_method) - cor_matrixDT = data.table::as.data.table(cor_matrix) - cor_matrixDT[, feat_ID := rownames(cor_matrix)] - cor_DT = data.table::melt.data.table(data = cor_matrixDT, - id.vars = 'feat_ID', value.name = 'expr_cor') + # get spatial locations + spatial_locs <- getSpatialLocations(gobject, + spat_unit = spat_unit, + name = spat_loc_name, + output = "data.table", + copy_obj = TRUE + ) - ## 3. merge spatial and expression correlation - data.table::setorder(cor_spat_DT, feat_ID, variable) - data.table::setorder(cor_DT, feat_ID, variable) - doubleDT = cbind(cor_spat_DT, expr_cor = cor_DT[['expr_cor']]) + ## spatial averaging or smoothing + if (method == "grid") { + # get spatial grid + spatial_grid <- getSpatialGrid( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = spatial_grid_name, + return_grid_Obj = FALSE + ) + + loc_av_expr_matrix <- do_spatial_grid_averaging( + expression_matrix = as.matrix(expr_values), + spatial_grid = spatial_grid, + spatial_locs = spatial_locs, + subset_feats = subset_feats, + min_cells_per_grid = min_cells_per_grid + ) + + # data.table variables + feat_ID <- variable <- NULL + + cor_spat_matrix <- cor_flex(t_flex(as.matrix( + loc_av_expr_matrix)), method = cor_method) + cor_spat_matrixDT <- data.table::as.data.table(cor_spat_matrix) + cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] + cor_spat_DT <- data.table::melt.data.table( + data = cor_spat_matrixDT, + id.vars = "feat_ID", value.name = "spat_cor" + ) + } - # difference in correlation scores - doubleDT[, cordiff := spat_cor - expr_cor] + if (method == "network") { + # get spatial network + spatial_network <- getSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + name = spatial_network_name, + output = "networkDT" + ) + + knn_av_expr_matrix <- do_spatial_knn_smoothing( + expression_matrix = as.matrix(expr_values), + spatial_network = spatial_network, + subset_feats = subset_feats, + b = network_smoothing + ) + + + + + cor_spat_matrix <- cor_flex(t_flex(as.matrix( + knn_av_expr_matrix)), method = cor_method) + cor_spat_matrixDT <- data.table::as.data.table(cor_spat_matrix) + cor_spat_matrixDT[, feat_ID := rownames(cor_spat_matrix)] + cor_spat_DT <- data.table::melt.data.table( + data = cor_spat_matrixDT, + id.vars = "feat_ID", value.name = "spat_cor" + ) + } - # difference in rank scores - doubleDT[, spatrank := frank(-spat_cor, ties.method = 'first'), by = feat_ID] - doubleDT[, exprrank := frank(-expr_cor, ties.method = 'first'), by = feat_ID] - doubleDT[, rankdiff := spatrank - exprrank] - # sort data - data.table::setorder(doubleDT, feat_ID, -spat_cor) - spatCorObject = list(cor_DT = doubleDT, - feat_order = rownames(cor_spat_matrix), - cor_hclust = list(), - cor_clusters = list()) + # data.table variables + cordiff <- spat_cor <- expr_cor <- spatrank <- exprrank <- rankdiff <- NULL + + ## 2. perform expression correlation at single-cell level without + ## spatial information + cor_matrix <- cor_flex(t_flex(expr_values), method = cor_method) + cor_matrixDT <- data.table::as.data.table(cor_matrix) + cor_matrixDT[, feat_ID := rownames(cor_matrix)] + cor_DT <- data.table::melt.data.table( + data = cor_matrixDT, + id.vars = "feat_ID", value.name = "expr_cor" + ) - class(spatCorObject) = append('spatCorObject', class(spatCorObject)) + ## 3. merge spatial and expression correlation + data.table::setorder(cor_spat_DT, feat_ID, variable) + data.table::setorder(cor_DT, feat_ID, variable) + doubleDT <- cbind(cor_spat_DT, expr_cor = cor_DT[["expr_cor"]]) + + # difference in correlation scores + doubleDT[, cordiff := spat_cor - expr_cor] + + # difference in rank scores + doubleDT[, spatrank := frank( + -spat_cor, ties.method = "first"), by = feat_ID] + doubleDT[, exprrank := frank( + -expr_cor, ties.method = "first"), by = feat_ID] + doubleDT[, rankdiff := spatrank - exprrank] + + # sort data + data.table::setorder(doubleDT, feat_ID, -spat_cor) + + spatCorObject <- list( + cor_DT = doubleDT, + feat_order = rownames(cor_spat_matrix), + cor_hclust = list(), + cor_clusters = list() + ) - return(spatCorObject) + class(spatCorObject) <- append("spatCorObject", class(spatCorObject)) + return(spatCorObject) } @@ -3502,58 +3758,64 @@ detectSpatialCorFeats <- function(gobject, #' @param subset_feats subset of feats to use #' @param subset_genes deprecated, use \code{subset_feats} #' @param spatial_network_name name of spatial network to use -#' @param network_smoothing smoothing factor beteen 0 and 1 (default: automatic) +#' @param network_smoothing smoothing factor beteen 0 and 1 +#' (default: automatic) #' @param spatial_grid_name name of spatial grid to use #' @param min_cells_per_grid minimum number of cells to consider a grid #' @param cor_method correlation method #' @return returns a spatial correlation object: "spatCorObject" #' @details -#' For method = network, it expects a fully connected spatial network. You can make sure to create a -#' fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +#' For method = network, it expects a fully connected spatial network. You +#' can make sure to create a +#' fully connected network by setting minimal_k > 0 in the +#' \code{\link{createSpatialNetwork}} function. #' \itemize{ -#' \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} -#' \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell -#' by using the neighbours within the predefined spatial network. b is a smoothening factor -#' that defaults to 1 - 1/k, where k is the median number of k-neighbors in the -#' selected spatial network. Setting b = 0 means no smoothing and b = 1 means no contribution +#' \item{1. grid-averaging: }{average gene expression values within a +#' predefined spatial grid} +#' \item{2. network-averaging: }{smoothens the gene expression matrix by +#' averaging the expression within one cell +#' by using the neighbours within the predefined spatial network. b is a +#' smoothening factor that defaults to 1 - 1/k, where k is the median +#' number of k-neighbors in the selected spatial network. Setting b = 0 +#' means no smoothing and b = 1 means no contribution #' from its own expression.} #' } #' The spatCorObject can be further explored with showSpatialCorGenes() #' @seealso \code{\link{showSpatialCorGenes}} #' @export detectSpatialCorGenes <- function(gobject, - feat_type = NULL, - spat_unit = NULL, - method = c('grid', 'network'), - expression_values = c('normalized', 'scaled', 'custom'), - subset_feats = NULL, - subset_genes = NULL, - spatial_network_name = 'Delaunay_network', - network_smoothing = NULL, - spatial_grid_name = 'spatial_grid', - min_cells_per_grid = 4, - cor_method = c('pearson', 'kendall', 'spearman')) { - - ## deprecated arguments - if(!is.null(subset_genes)) { - subset_feats = subset_genes - warning('subset_genes is deprecated, use subset_feats in the future \n') - } - - warning("Deprecated and replaced by detectSpatialCorFeats") - - detectSpatialCorFeats(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - method = method, - expression_values = expression_values, - subset_feats = subset_feats, - spatial_network_name = spatial_network_name, - network_smoothing = network_smoothing, - spatial_grid_name = spatial_grid_name, - min_cells_per_grid = min_cells_per_grid, - cor_method = cor_method) + feat_type = NULL, + spat_unit = NULL, + method = c("grid", "network"), + expression_values = c("normalized", "scaled", "custom"), + subset_feats = NULL, + subset_genes = NULL, + spatial_network_name = "Delaunay_network", + network_smoothing = NULL, + spatial_grid_name = "spatial_grid", + min_cells_per_grid = 4, + cor_method = c("pearson", "kendall", "spearman")) { + ## deprecated arguments + if (!is.null(subset_genes)) { + subset_feats <- subset_genes + warning("subset_genes is deprecated, use subset_feats in the future") + } + + warning("Deprecated and replaced by detectSpatialCorFeats") + detectSpatialCorFeats( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + method = method, + expression_values = expression_values, + subset_feats = subset_feats, + spatial_network_name = spatial_network_name, + network_smoothing = network_smoothing, + spatial_grid_name = spatial_grid_name, + min_cells_per_grid = min_cells_per_grid, + cor_method = cor_method + ) } @@ -3569,79 +3831,81 @@ detectSpatialCorGenes <- function(gobject, #' @param feats subset of features to show #' @param min_spat_cor filter on minimum spatial correlation #' @param min_expr_cor filter on minimum single-cell expression correlation -#' @param min_cor_diff filter on minimum correlation difference (spatial vs expression) -#' @param min_rank_diff filter on minimum correlation rank difference (spatial vs expression) +#' @param min_cor_diff filter on minimum correlation difference +#' (spatial vs expression) +#' @param min_rank_diff filter on minimum correlation rank difference +#' (spatial vs expression) #' @param show_top_feats show top features per gene #' @return data.table with filtered information #' @export -showSpatialCorFeats = function(spatCorObject, - use_clus_name = NULL, - selected_clusters = NULL, - feats = NULL, - min_spat_cor = 0.5, - min_expr_cor = NULL, - min_cor_diff = NULL, - min_rank_diff = NULL, - show_top_feats = NULL) { - - # data.table variables - clus = feat_ID = spat_cor = cor_diff = rankdiff = NULL - - if(!'spatCorObject' %in% class(spatCorObject)) { - stop('\n spatCorObject needs to be the output from detectSpatialCorfeats() \n') - } - - filter_DT = data.table::copy(spatCorObject[['cor_DT']]) - - if(!is.null(use_clus_name)) { - - clusters_part = spatCorObject[['cor_clusters']][[use_clus_name]] - - # combine spatial correlation info and clusters - clusters = clusters_part - names_clusters = names(clusters_part) - clusters_DT = data.table::data.table('feat_ID' = names_clusters, 'clus' = clusters) - filter_DT = data.table::merge.data.table(filter_DT, clusters_DT, by = 'feat_ID') - } - - ## 0. subset clusters - if(!is.null(selected_clusters)) { - filter_DT = filter_DT[clus %in% selected_clusters] - } - - - ## 1. subset feats - if(!is.null(feats)) { - filter_DT = filter_DT[feat_ID %in% feats] - } - - ## 2. select spatial correlation - if(!is.null(min_spat_cor)) { - filter_DT = filter_DT[spat_cor >= min_spat_cor] - } - - ## 3. minimum expression correlation - if(!is.null(min_expr_cor)) { - filter_DT = filter_DT[spat_cor >= min_expr_cor] - } - - ## 4. minimum correlation difference - if(!is.null(min_cor_diff)) { - filter_DT = filter_DT[cor_diff >= min_cor_diff] - } - - ## 5. minimum correlation difference - if(!is.null(min_rank_diff)) { - filter_DT = filter_DT[rankdiff >= min_rank_diff] - } - - ## 6. show only top feats - if(!is.null(show_top_feats)) { - filter_DT = filter_DT[, head(.SD, show_top_feats), by = feat_ID] - } - - return(filter_DT) +showSpatialCorFeats <- function(spatCorObject, + use_clus_name = NULL, + selected_clusters = NULL, + feats = NULL, + min_spat_cor = 0.5, + min_expr_cor = NULL, + min_cor_diff = NULL, + min_rank_diff = NULL, + show_top_feats = NULL) { + # data.table variables + clus <- feat_ID <- spat_cor <- cor_diff <- rankdiff <- NULL + + if (!"spatCorObject" %in% class(spatCorObject)) { + stop("spatCorObject needs to be the output from + detectSpatialCorfeats()") + } + + filter_DT <- data.table::copy(spatCorObject[["cor_DT"]]) + if (!is.null(use_clus_name)) { + clusters_part <- spatCorObject[["cor_clusters"]][[use_clus_name]] + + # combine spatial correlation info and clusters + clusters <- clusters_part + names_clusters <- names(clusters_part) + clusters_DT <- data.table::data.table( + "feat_ID" = names_clusters, "clus" = clusters) + filter_DT <- data.table::merge.data.table( + filter_DT, clusters_DT, by = "feat_ID") + } + + ## 0. subset clusters + if (!is.null(selected_clusters)) { + filter_DT <- filter_DT[clus %in% selected_clusters] + } + + + ## 1. subset feats + if (!is.null(feats)) { + filter_DT <- filter_DT[feat_ID %in% feats] + } + + ## 2. select spatial correlation + if (!is.null(min_spat_cor)) { + filter_DT <- filter_DT[spat_cor >= min_spat_cor] + } + + ## 3. minimum expression correlation + if (!is.null(min_expr_cor)) { + filter_DT <- filter_DT[spat_cor >= min_expr_cor] + } + + ## 4. minimum correlation difference + if (!is.null(min_cor_diff)) { + filter_DT <- filter_DT[cor_diff >= min_cor_diff] + } + + ## 5. minimum correlation difference + if (!is.null(min_rank_diff)) { + filter_DT <- filter_DT[rankdiff >= min_rank_diff] + } + + ## 6. show only top feats + if (!is.null(show_top_feats)) { + filter_DT <- filter_DT[, head(.SD, show_top_feats), by = feat_ID] + } + + return(filter_DT) } @@ -3655,34 +3919,35 @@ showSpatialCorFeats = function(spatCorObject, #' @param genes subset of genes to show #' @param min_spat_cor filter on minimum spatial correlation #' @param min_expr_cor filter on minimum single-cell expression correlation -#' @param min_cor_diff filter on minimum correlation difference (spatial vs expression) -#' @param min_rank_diff filter on minimum correlation rank difference (spatial vs expression) +#' @param min_cor_diff filter on minimum correlation difference +#' (spatial vs expression) +#' @param min_rank_diff filter on minimum correlation rank difference +#' (spatial vs expression) #' @param show_top_genes show top genes per gene #' @return data.table with filtered information #' @export -showSpatialCorGenes = function(spatCorObject, - use_clus_name = NULL, - selected_clusters = NULL, - genes = NULL, - min_spat_cor = 0.5, - min_expr_cor = NULL, - min_cor_diff = NULL, - min_rank_diff = NULL, - show_top_genes = NULL) { - - warning("Deprecated and replaced by showSpatialCorFeats") - - showSpatialCorFeats(spatCorObject = spatCorObject, - use_clus_name = use_clus_name, - selected_clusters = selected_clusters, - feats = genes, - min_spat_cor = min_spat_cor, - min_expr_cor = min_expr_cor, - min_cor_diff = min_cor_diff, - min_rank_diff = min_rank_diff, - show_top_feats = show_top_genes) - - +showSpatialCorGenes <- function(spatCorObject, + use_clus_name = NULL, + selected_clusters = NULL, + genes = NULL, + min_spat_cor = 0.5, + min_expr_cor = NULL, + min_cor_diff = NULL, + min_rank_diff = NULL, + show_top_genes = NULL) { + warning("Deprecated and replaced by showSpatialCorFeats") + + showSpatialCorFeats( + spatCorObject = spatCorObject, + use_clus_name = use_clus_name, + selected_clusters = selected_clusters, + feats = genes, + min_spat_cor = min_spat_cor, + min_expr_cor = min_expr_cor, + min_cor_diff = min_cor_diff, + min_rank_diff = min_rank_diff, + show_top_feats = show_top_genes + ) } @@ -3701,43 +3966,41 @@ showSpatialCorGenes = function(spatCorObject, #' @param return_obj return spatial correlation object (spatCorObject) #' @return spatCorObject or cluster results #' @export -clusterSpatialCorFeats = function(spatCorObject, - name = 'spat_clus', - hclust_method = 'ward.D', - k = 10, - return_obj = TRUE) { - - - # check input - if(!'spatCorObject' %in% class(spatCorObject)) { - stop('\n spatCorObject needs to be the output from detectSpatialCorfeats() \n') - } - - # create correlation matrix - cor_DT = spatCorObject[['cor_DT']] - cor_DT_dc = data.table::dcast.data.table(cor_DT, formula = feat_ID~variable, value.var = 'spat_cor') - cor_matrix = dt_to_matrix(cor_DT_dc) - - # re-ordering matrix - my_feat_order = spatCorObject[['feat_order']] - cor_matrix = cor_matrix[my_feat_order, my_feat_order] +clusterSpatialCorFeats <- function(spatCorObject, + name = "spat_clus", + hclust_method = "ward.D", + k = 10, + return_obj = TRUE) { + # check input + if (!"spatCorObject" %in% class(spatCorObject)) { + stop("spatCorObject needs to be the output from + detectSpatialCorfeats()") + } - # cluster - cor_dist = stats::as.dist(1-cor_matrix) - cor_h = stats::hclust(d = cor_dist, method = hclust_method) - cor_clus = stats::cutree(cor_h, k = k) + # create correlation matrix + cor_DT <- spatCorObject[["cor_DT"]] + cor_DT_dc <- data.table::dcast.data.table( + cor_DT, formula = feat_ID ~ variable, value.var = "spat_cor") + cor_matrix <- dt_to_matrix(cor_DT_dc) - if(return_obj == TRUE) { - spatCorObject[['cor_hclust']][[name]] = cor_h - spatCorObject[['cor_clusters']][[name]] = cor_clus - spatCorObject[['cor_coexpr_groups']][[name]] = NA + # re-ordering matrix + my_feat_order <- spatCorObject[["feat_order"]] + cor_matrix <- cor_matrix[my_feat_order, my_feat_order] - return(spatCorObject) + # cluster + cor_dist <- stats::as.dist(1 - cor_matrix) + cor_h <- stats::hclust(d = cor_dist, method = hclust_method) + cor_clus <- stats::cutree(cor_h, k = k) - } else { - return(list('hclust' = cor_h, 'clusters' = cor_clus)) - } + if (return_obj == TRUE) { + spatCorObject[["cor_hclust"]][[name]] <- cor_h + spatCorObject[["cor_clusters"]][[name]] <- cor_clus + spatCorObject[["cor_coexpr_groups"]][[name]] <- NA + return(spatCorObject) + } else { + return(list("hclust" = cor_h, "clusters" = cor_clus)) + } } @@ -3753,20 +4016,20 @@ clusterSpatialCorFeats = function(spatCorObject, #' @param return_obj return spatial correlation object (spatCorObject) #' @return spatCorObject or cluster results #' @export -clusterSpatialCorGenes = function(spatCorObject, - name = 'spat_clus', - hclust_method = 'ward.D', - k = 10, - return_obj = TRUE) { - - warning("Deprecated and replaced by clusterSpatialCorFeats") - - clusterSpatialCorFeats(spatCorObject = spatCorObject, - name = name, - hclust_method = hclust_method, - k = k, - return_obj = return_obj) - +clusterSpatialCorGenes <- function(spatCorObject, + name = "spat_clus", + hclust_method = "ward.D", + k = 10, + return_obj = TRUE) { + warning("Deprecated and replaced by clusterSpatialCorFeats") + + clusterSpatialCorFeats( + spatCorObject = spatCorObject, + name = name, + hclust_method = hclust_method, + k = k, + return_obj = return_obj + ) } @@ -3778,7 +4041,8 @@ clusterSpatialCorGenes = function(spatCorObject, #' @description Create heatmap of spatially correlated features #' @param gobject giotto object #' @param spatCorObject spatial correlation object -#' @param use_clus_name name of clusters to visualize (from clusterSpatialCorFeats()) +#' @param use_clus_name name of clusters to visualize +#' (from clusterSpatialCorFeats()) #' @param show_cluster_annot show cluster annotation on top of heatmap #' @param show_row_dend show row dendrogram #' @param show_column_dend show column dendrogram @@ -3787,98 +4051,104 @@ clusterSpatialCorGenes = function(spatCorObject, #' @param show_plot show plot #' @param return_plot return ggplot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param -#' @param \dots additional parameters to the \code{\link[ComplexHeatmap]{Heatmap}} function from ComplexHeatmap +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param +#' @param \dots additional parameters to the +#' \code{\link[ComplexHeatmap]{Heatmap}} function from ComplexHeatmap #' @return Heatmap generated by ComplexHeatmap #' @export -heatmSpatialCorFeats = function(gobject, - spatCorObject, - use_clus_name = NULL, - show_cluster_annot = TRUE, - show_row_dend = T, - show_column_dend = F, - show_row_names = F, - show_column_names = F, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'heatmSpatialCorFeats', - ...) { - - ## check input - if(!'spatCorObject' %in% class(spatCorObject)) { - stop('\n spatCorObject needs to be the output from detectSpatialCorFeats() \n') - } - - ## package check for ComplexHeatmap - package_check(pkg_name = 'ComplexHeatmap', repository = 'CRAN') - - ## create correlation matrix - cor_DT = spatCorObject[['cor_DT']] - cor_DT_dc = data.table::dcast.data.table(cor_DT, formula = feat_ID~variable, value.var = 'spat_cor') - cor_matrix = dt_to_matrix(cor_DT_dc) - - # re-ordering matrix - my_feat_order = spatCorObject[['feat_order']] - cor_matrix = cor_matrix[my_feat_order, my_feat_order] - - - ## fix row and column names - cor_matrix = cor_matrix[rownames(cor_matrix), rownames(cor_matrix)] - - ## default top annotation - ha = NULL - - if(!is.null(use_clus_name)) { - hclust_part = spatCorObject[['cor_hclust']][[use_clus_name]] - - if(is.null(hclust_part)) { - cat(use_clus_name, ' does not exist, make one with spatCorCluster \n') - hclust_part = TRUE +heatmSpatialCorFeats <- function(gobject, + spatCorObject, + use_clus_name = NULL, + show_cluster_annot = TRUE, + show_row_dend = TRUE, + show_column_dend = FALSE, + show_row_names = FALSE, + show_column_names = FALSE, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "heatmSpatialCorFeats", + ...) { + ## check input + if (!"spatCorObject" %in% class(spatCorObject)) { + stop("spatCorObject needs to be the output from + detectSpatialCorFeats()") + } - } else { - clusters_part = spatCorObject[['cor_clusters']][[use_clus_name]] + ## package check for ComplexHeatmap + package_check(pkg_name = "ComplexHeatmap", repository = "CRAN") + + ## create correlation matrix + cor_DT <- spatCorObject[["cor_DT"]] + cor_DT_dc <- data.table::dcast.data.table( + cor_DT, formula = feat_ID ~ variable, value.var = "spat_cor") + cor_matrix <- dt_to_matrix(cor_DT_dc) + + # re-ordering matrix + my_feat_order <- spatCorObject[["feat_order"]] + cor_matrix <- cor_matrix[my_feat_order, my_feat_order] + + + ## fix row and column names + cor_matrix <- cor_matrix[rownames(cor_matrix), rownames(cor_matrix)] - if(show_cluster_annot) { - uniq_clusters = unique(clusters_part) + ## default top annotation + ha <- NULL - # color vector - mycolors = getDistinctColors(length(uniq_clusters)) - names(mycolors) = uniq_clusters - ha = ComplexHeatmap::HeatmapAnnotation(bar = as.vector(clusters_part), - col = list(bar = mycolors), - annotation_legend_param = list(title = NULL)) - } + if (!is.null(use_clus_name)) { + hclust_part <- spatCorObject[["cor_hclust"]][[use_clus_name]] + if (is.null(hclust_part)) { + cat(use_clus_name, " does not exist, make one with spatCorCluster") + hclust_part <- TRUE + } else { + clusters_part <- spatCorObject[["cor_clusters"]][[use_clus_name]] + + if (show_cluster_annot) { + uniq_clusters <- unique(clusters_part) + + # color vector + mycolors <- getDistinctColors(length(uniq_clusters)) + names(mycolors) <- uniq_clusters + ha <- ComplexHeatmap::HeatmapAnnotation( + bar = as.vector(clusters_part), + col = list(bar = mycolors), + annotation_legend_param = list(title = NULL) + ) + } + } + } else { + hclust_part <- TRUE } - } else { - hclust_part = TRUE - } - - - ## create heatmap - heatm = ComplexHeatmap::Heatmap(matrix = as.matrix(cor_matrix), - cluster_rows = hclust_part, - cluster_columns = hclust_part, - show_row_dend = show_row_dend, - show_column_dend = show_column_dend, - show_row_names = show_row_names, - show_column_names = show_column_names, - top_annotation = ha, ...) - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = heatm, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + + + ## create heatmap + heatm <- ComplexHeatmap::Heatmap( + matrix = as.matrix(cor_matrix), + cluster_rows = hclust_part, + cluster_columns = hclust_part, + show_row_dend = show_row_dend, + show_column_dend = show_column_dend, + show_row_names = show_row_names, + show_column_names = show_column_names, + top_annotation = ha, ... + ) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = heatm, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -3889,11 +4159,10 @@ heatmSpatialCorFeats = function(gobject, #' @inheritDotParams heatmSpatialCorFeats #' @seealso \code{\link{heatmSpatialCorFeats}} #' @export -heatmSpatialCorGenes = function(...) { +heatmSpatialCorGenes <- function(...) { + .Deprecated(new = "heatmSpatialCorFeats") - .Deprecated(new = "heatmSpatialCorFeats") - - heatmSpatialCorFeats(...) + heatmSpatialCorFeats(...) } @@ -3902,104 +4171,117 @@ heatmSpatialCorGenes = function(...) { #' @title rankSpatialCorGroups #' @name rankSpatialCorGroups -#' @description Rank spatial correlated clusters according to correlation structure +#' @description Rank spatial correlated clusters according to correlation +#' structure #' @param gobject giotto object #' @param spatCorObject spatial correlation object -#' @param use_clus_name name of clusters to visualize (from clusterSpatialCorGenes()) +#' @param use_clus_name name of clusters to visualize +#' (from clusterSpatialCorGenes()) #' @param show_plot show plot #' @param return_plot return ggplot object #' @param save_plot directly save the plot [boolean] -#' @param save_param list of saving parameters, see \code{\link{showSaveParameters}} -#' @param default_save_name default save name for saving, don't change, change save_name in save_param -#' @return data.table with positive (within group) and negative (outside group) scores +#' @param save_param list of saving parameters, see +#' \code{\link{showSaveParameters}} +#' @param default_save_name default save name for saving, don't change, +#' change save_name in save_param +#' @return data.table with positive (within group) and negative +#' (outside group) scores #' @export -rankSpatialCorGroups = function(gobject, - spatCorObject, - use_clus_name = NULL, - show_plot = NULL, - return_plot = FALSE, - save_plot = NULL, - save_param = list(), - default_save_name = 'rankSpatialCorGroups') { - - - ## check input - if(!'spatCorObject' %in% class(spatCorObject)) { - stop('\n spatCorObject needs to be the output from detectSpatialCorFeats() \n') - } - - ## check if cluster exist - if(is.null(use_clus_name)) { - stop('use_clus_name does not exist \n') - } - clusters_part = spatCorObject[['cor_clusters']][[use_clus_name]] - - if(is.null(clusters_part)) { - stop('use_clus_name does not exist \n') - } - - ## create correlation matrix - cor_DT = spatCorObject[['cor_DT']] - cor_DT_dc = data.table::dcast.data.table(cor_DT, formula = feat_ID~variable, value.var = 'spat_cor') - cor_matrix = dt_to_matrix(cor_DT_dc) - - # re-ordering matrix - my_feat_order = spatCorObject[['feat_order']] - cor_matrix = cor_matrix[my_feat_order, my_feat_order] - - - - res_cor_list = list() - res_neg_cor_list = list() - nr_feats_list = list() - - for(id in seq_along(unique(clusters_part))) { - - clus_id = unique(clusters_part)[id] - selected_feats = names(clusters_part[clusters_part == clus_id]) - nr_feats_list[[id]] = length(selected_feats) - - sub_cor_matrix = cor_matrix[rownames(cor_matrix) %in% selected_feats, colnames(cor_matrix) %in% selected_feats] - mean_score = mean_flex(sub_cor_matrix) - res_cor_list[[id]] = mean_score - - sub_neg_cor_matrix = cor_matrix[rownames(cor_matrix) %in% selected_feats, !colnames(cor_matrix) %in% selected_feats] - mean_neg_score = mean_flex(sub_neg_cor_matrix) - res_neg_cor_list[[id]] = mean_neg_score - } - - - # data.table variables - cor_neg_adj = cor_neg_score = adj_cor_score = cor_score = clusters = nr_feats = NULL - - res_cor_DT = data.table::data.table('clusters' = unique(clusters_part), - cor_score = unlist(res_cor_list), - cor_neg_score = unlist(res_neg_cor_list), - nr_feats = unlist(nr_feats_list)) - - res_cor_DT[, cor_neg_adj := 1-(cor_neg_score-min(cor_neg_score))] - res_cor_DT[, adj_cor_score := cor_neg_adj * cor_score] - data.table::setorder(res_cor_DT, -adj_cor_score) - res_cor_DT[, clusters := factor(x = clusters, levels = rev(clusters))] - - pl = gg_simple_scatter(data = res_cor_DT, - x = 'clusters', - y = 'adj_cor_score', - size = 'nr_feats', - xlab = 'cluster', - ylab = 'pos r x (1 - (neg_r - min(neg_r)))') - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = res_cor_DT - )) +rankSpatialCorGroups <- function(gobject, + spatCorObject, + use_clus_name = NULL, + show_plot = NULL, + return_plot = FALSE, + save_plot = NULL, + save_param = list(), + default_save_name = "rankSpatialCorGroups") { + ## check input + if (!"spatCorObject" %in% class(spatCorObject)) { + stop("spatCorObject needs to be the output from + detectSpatialCorFeats()") + } + + ## check if cluster exist + if (is.null(use_clus_name)) { + stop("use_clus_name does not exist") + } + clusters_part <- spatCorObject[["cor_clusters"]][[use_clus_name]] + + if (is.null(clusters_part)) { + stop("use_clus_name does not exist") + } + + ## create correlation matrix + cor_DT <- spatCorObject[["cor_DT"]] + cor_DT_dc <- data.table::dcast.data.table( + cor_DT, formula = feat_ID ~ variable, value.var = "spat_cor") + cor_matrix <- dt_to_matrix(cor_DT_dc) + + # re-ordering matrix + my_feat_order <- spatCorObject[["feat_order"]] + cor_matrix <- cor_matrix[my_feat_order, my_feat_order] + + + + res_cor_list <- list() + res_neg_cor_list <- list() + nr_feats_list <- list() + + for (id in seq_along(unique(clusters_part))) { + clus_id <- unique(clusters_part)[id] + selected_feats <- names(clusters_part[clusters_part == clus_id]) + nr_feats_list[[id]] <- length(selected_feats) + + sub_cor_matrix <- cor_matrix[ + rownames(cor_matrix) %in% selected_feats, + colnames(cor_matrix) %in% selected_feats] + mean_score <- mean_flex(sub_cor_matrix) + res_cor_list[[id]] <- mean_score + + sub_neg_cor_matrix <- cor_matrix[ + rownames(cor_matrix) %in% selected_feats, + !colnames(cor_matrix) %in% selected_feats] + mean_neg_score <- mean_flex(sub_neg_cor_matrix) + res_neg_cor_list[[id]] <- mean_neg_score + } + + + # data.table variables + cor_neg_adj <- cor_neg_score <- adj_cor_score <- cor_score <- + clusters <- nr_feats <- NULL + + res_cor_DT <- data.table::data.table( + "clusters" = unique(clusters_part), + cor_score = unlist(res_cor_list), + cor_neg_score = unlist(res_neg_cor_list), + nr_feats = unlist(nr_feats_list) + ) + + res_cor_DT[, cor_neg_adj := 1 - (cor_neg_score - min(cor_neg_score))] + res_cor_DT[, adj_cor_score := cor_neg_adj * cor_score] + data.table::setorder(res_cor_DT, -adj_cor_score) + res_cor_DT[, clusters := factor(x = clusters, levels = rev(clusters))] + + pl <- gg_simple_scatter( + data = res_cor_DT, + x = "clusters", + y = "adj_cor_score", + size = "nr_feats", + xlab = "cluster", + ylab = "pos r x (1 - (neg_r - min(neg_r)))" + ) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = res_cor_DT + )) } @@ -4007,9 +4289,11 @@ rankSpatialCorGroups = function(gobject, #' @title getBalancedSpatCoexpressionFeats #' @name getBalancedSpatCoexpressionFeats -#' @description Extract features from spatial co-expression modules in a balanced manner +#' @description Extract features from spatial co-expression modules in a +#' balanced manner #' @param spatCorObject spatial correlation object -#' @param maximum maximum number of genes to get from each spatial co-expression module +#' @param maximum maximum number of genes to get from each spatial +#' co-expression module #' @param rank ranking method (see details) #' @param informed_ranking vector of ranked features #' @param seed seed @@ -4023,157 +4307,172 @@ rankSpatialCorGroups = function(gobject, #' \item{3. informed: }{Features are selected based on prior information/ranking} #' } #' @export -getBalancedSpatCoexpressionFeats = function(spatCorObject, - maximum = 50, - rank = c('weighted', 'random', 'informed'), - informed_ranking = NULL, - seed = NA, - verbose = TRUE) { - - # data.table vars - feat_ID = variable = combo = spat_cor = rnk = feat_id = V1 = NULL - - rank = match.arg(rank, choices = c('weighted', 'random', 'informed')) - - clusters = spatCorObject$cor_clusters$spat_netw_clus - - # rank = random - if(rank == 'random') { - - if(!is.na(seed) & is.numeric(seed)) { - on.exit(random_seed(), add = TRUE) - set.seed(seed) - wrap_msg('Seed has been set for random') - } else { - wrap_msg('Random is selected, but no seed has been set \n - Results might not be fully reproducible \n') - } - - result_list = list() - for(clus in seq_along(unique(clusters))) { - - selected_cluster_features = names(clusters[clusters == clus]) - - feat_length = length(selected_cluster_features) - if(feat_length < maximum) { - maximum_to_use = feat_length - wrap_msg('There are only ', feat_length, ' features for cluster ', clus, '\n', - 'Maximum will be set to ', feat_length, '\n') - } else {maximum_to_use = maximum} - - selected_feats = sample(x = selected_cluster_features, - size = maximum_to_use, - replace = FALSE) - clus_id = rep(clus, length(selected_feats)) - names(clus_id) = selected_feats - result_list[[clus]] = clus_id - } - - final_res = do.call('c', result_list) - - } - - - # rank = random - if(rank == 'weighted') { - - cor_data = spatCorObject$cor_DT - - result_list = list() - for(clus in seq_along(unique(clusters))) { - - if(verbose) print(clus) - - # get all pairwise spatial feature correlations and rank them - selected_cluster_features = names(clusters[clusters == clus]) - subset_cor_data = cor_data[feat_ID %in% selected_cluster_features & variable %in% selected_cluster_features] - subset_cor_data = subset_cor_data[feat_ID != variable] - subset_cor_data = dt_sort_combine_two_columns(DT = subset_cor_data, - column1 = 'feat_ID', - column2 = 'variable', myname = 'combo') - subset_cor_data = subset_cor_data[duplicated(combo)] - data.table::setorder(subset_cor_data, -spat_cor) - - # create a ranked data.table - rnk1DT = data.table::data.table(feat_id = subset_cor_data$feat_ID, rnk = seq_along(subset_cor_data$feat_ID)) - rnk2DT = data.table::data.table(feat_id = subset_cor_data$variable, rnk = seq_along(subset_cor_data$variable)) - rnkDT = data.table::rbindlist(list(rnk1DT, rnk2DT)) - data.table::setorder(rnkDT, rnk) - - # summarize rank (weights) - rnkcombined = rnkDT[, sum(rnk), by = feat_id] - data.table::setorder(rnkcombined, V1) - - feat_length = nrow(rnkcombined) - if(feat_length < maximum) { - maximum_to_use = feat_length - wrap_msg('There are only ', feat_length, ' features for cluster ', clus, '\n', - 'Maximum will be set to ', feat_length, '\n') - } else {maximum_to_use = maximum} - - selected_feats = rnkcombined[1:maximum_to_use][['feat_id']] +getBalancedSpatCoexpressionFeats <- function(spatCorObject, + maximum = 50, + rank = c("weighted", "random", "informed"), + informed_ranking = NULL, + seed = NA, + verbose = TRUE) { + # data.table vars + feat_ID <- variable <- combo <- spat_cor <- rnk <- feat_id <- V1 <- NULL + + rank <- match.arg(rank, choices = c("weighted", "random", "informed")) + + clusters <- spatCorObject$cor_clusters$spat_netw_clus + + # rank = random + if (rank == "random") { + if (!is.na(seed) & is.numeric(seed)) { + on.exit(random_seed(), add = TRUE) + set.seed(seed) + wrap_msg("Seed has been set for random") + } else { + wrap_msg("Random is selected, but no seed has been set \n + Results might not be fully reproducible") + } - clus_id = rep(clus, length(selected_feats)) - names(clus_id) = selected_feats - result_list[[clus]] = clus_id + result_list <- list() + for (clus in seq_along(unique(clusters))) { + selected_cluster_features <- names(clusters[clusters == clus]) + + feat_length <- length(selected_cluster_features) + if (feat_length < maximum) { + maximum_to_use <- feat_length + wrap_msg( + "There are only ", feat_length, " features for cluster ", + clus, "\n", + "Maximum will be set to ", feat_length) + } else { + maximum_to_use <- maximum + } + + selected_feats <- sample( + x = selected_cluster_features, + size = maximum_to_use, + replace = FALSE + ) + clus_id <- rep(clus, length(selected_feats)) + names(clus_id) <- selected_feats + result_list[[clus]] <- clus_id + } + final_res <- do.call("c", result_list) } - final_res = do.call('c', result_list) - - } - + # rank = random + if (rank == "weighted") { + cor_data <- spatCorObject$cor_DT + + result_list <- list() + for (clus in seq_along(unique(clusters))) { + if (verbose) print(clus) + + # get all pairwise spatial feature correlations and rank them + selected_cluster_features <- names(clusters[clusters == clus]) + subset_cor_data <- cor_data[ + feat_ID %in% selected_cluster_features & + variable %in% selected_cluster_features] + subset_cor_data <- subset_cor_data[feat_ID != variable] + subset_cor_data <- dt_sort_combine_two_columns( + DT = subset_cor_data, + column1 = "feat_ID", + column2 = "variable", myname = "combo" + ) + subset_cor_data <- subset_cor_data[duplicated(combo)] + data.table::setorder(subset_cor_data, -spat_cor) + + # create a ranked data.table + rnk1DT <- data.table::data.table( + feat_id = subset_cor_data$feat_ID, + rnk = seq_along(subset_cor_data$feat_ID)) + rnk2DT <- data.table::data.table( + feat_id = subset_cor_data$variable, + rnk = seq_along(subset_cor_data$variable)) + rnkDT <- data.table::rbindlist(list(rnk1DT, rnk2DT)) + data.table::setorder(rnkDT, rnk) + + # summarize rank (weights) + rnkcombined <- rnkDT[, sum(rnk), by = feat_id] + data.table::setorder(rnkcombined, V1) + + feat_length <- nrow(rnkcombined) + if (feat_length < maximum) { + maximum_to_use <- feat_length + wrap_msg( + "There are only ", feat_length, " features for cluster ", + clus, "\n", + "Maximum will be set to ", feat_length) + } else { + maximum_to_use <- maximum + } + + selected_feats <- rnkcombined[1:maximum_to_use][["feat_id"]] + + clus_id <- rep(clus, length(selected_feats)) + names(clus_id) <- selected_feats + result_list[[clus]] <- clus_id + } - # rank = random - if(rank == 'informed') { - if(is.null(informed_ranking)) { - stop('Informed has been selected, but no informed ranking vector has been provided') + final_res <- do.call("c", result_list) } - # informed_ranking vector should be a ranked gene list - informed_ranking_numerical = seq_along(informed_ranking) - names(informed_ranking_numerical) = informed_ranking - - result_list = list() - for(clus in seq_along(unique(clusters))) { - - selected_cluster_features = names(clusters[clusters == clus]) - - feat_length = length(selected_cluster_features) - if(feat_length < maximum) { - maximum_to_use = feat_length - wrap_msg('There are only ', feat_length, ' features for cluster ', clus, '\n', - 'Maximum will be set to ', feat_length, '\n') - } else {maximum_to_use = maximum} - - informed_subset = informed_ranking_numerical[names(informed_ranking_numerical) %in% selected_cluster_features] - informed_subset = sort(informed_subset) - - feat_length = length(informed_subset) - if(feat_length < maximum) { - maximum_to_use = feat_length - wrap_msg('There are only ', feat_length, ' features for cluster ', clus, '\n', - 'Maximum will be set to ', feat_length, '\n') - } else {maximum_to_use = maximum} - - selected_feats = names(informed_subset[1:maximum_to_use]) + # rank = random + if (rank == "informed") { + if (is.null(informed_ranking)) { + stop("Informed has been selected, but no informed ranking + vector has been provided") + } - clus_id = rep(clus, length(selected_feats)) - names(clus_id) = selected_feats - result_list[[clus]] = clus_id + # informed_ranking vector should be a ranked gene list + informed_ranking_numerical <- seq_along(informed_ranking) + names(informed_ranking_numerical) <- informed_ranking + + result_list <- list() + for (clus in seq_along(unique(clusters))) { + selected_cluster_features <- names(clusters[clusters == clus]) + + feat_length <- length(selected_cluster_features) + if (feat_length < maximum) { + maximum_to_use <- feat_length + wrap_msg( + "There are only ", feat_length, " features for cluster ", + clus, "\n", + "Maximum will be set to ", feat_length) + } else { + maximum_to_use <- maximum + } + + + informed_subset <- informed_ranking_numerical[ + names(informed_ranking_numerical) %in% + selected_cluster_features] + informed_subset <- sort(informed_subset) + + feat_length <- length(informed_subset) + if (feat_length < maximum) { + maximum_to_use <- feat_length + wrap_msg( + "There are only ", feat_length, " features for cluster ", + clus, "\n", + "Maximum will be set to ", feat_length) + } else { + maximum_to_use <- maximum + } + + selected_feats <- names(informed_subset[1:maximum_to_use]) + + clus_id <- rep(clus, length(selected_feats)) + names(clus_id) <- selected_feats + result_list[[clus]] <- clus_id + } + final_res <- do.call("c", result_list) } - final_res = do.call('c', result_list) - - } - - return(final_res) - - + return(final_res) } @@ -4193,155 +4492,167 @@ getBalancedSpatCoexpressionFeats = function(spatCorObject, #' @param pattern_name name of spatial pattern #' @param pattern_cell_ids cell ids that make up the spatial pattern #' @param gene_name selected gene -#' @param spatial_prob probability for a high expressing gene value to be part of the spatial pattern +#' @param spatial_prob probability for a high expressing gene value to be +#' part of the spatial pattern #' @param gradient_direction direction of gradient #' @param show_pattern show the discrete spatial pattern #' @param pattern_colors 2 color vector for the spatial pattern #' @param \dots additional parameters for (re-)normalizing -#' @return Reprocessed Giotto object for which one gene has a forced spatial pattern +#' @return Reprocessed Giotto object for which one gene has a forced +#' spatial pattern #' @export -simulateOneGenePatternGiottoObject = function(gobject, - pattern_name = 'pattern', - pattern_cell_ids = NULL, - gene_name = NULL, - spatial_prob = 0.95, - gradient_direction = NULL, - show_pattern = TRUE, - pattern_colors = c('in' = 'green', 'out' = 'red'), - ...) { - - # data.table variables - cell_ID = sdimx_y = sdimx = sdimy = NULL - - if(is.null(pattern_cell_ids)) { - stop('pattern_cell_ids can not be NULL \n') - } +simulateOneGenePatternGiottoObject <- function(gobject, + pattern_name = "pattern", + pattern_cell_ids = NULL, + gene_name = NULL, + spatial_prob = 0.95, + gradient_direction = NULL, + show_pattern = TRUE, + pattern_colors = c("in" = "green", "out" = "red"), + ...) { + # data.table variables + cell_ID <- sdimx_y <- sdimx <- sdimy <- NULL - ## create and add annotation for pattern - cell_meta = pDataDT(gobject) - cell_meta[, (pattern_name) := ifelse(cell_ID %in% pattern_cell_ids, 'in', 'out')] + if (is.null(pattern_cell_ids)) { + stop("pattern_cell_ids can not be NULL") + } - newgobject = addCellMetadata( - gobject, - new_metadata = cell_meta[,c('cell_ID', pattern_name), with = FALSE], - by_column = TRUE, - column_cell_ID = 'cell_ID' - ) + ## create and add annotation for pattern + cell_meta <- pDataDT(gobject) + cell_meta[, (pattern_name) := ifelse( + cell_ID %in% pattern_cell_ids, "in", "out")] - # show pattern - if(show_pattern == TRUE) { - spatPlot2D(gobject = newgobject, save_plot = F, cell_color_code = pattern_colors, - point_size = 2, cell_color = pattern_name) - } + newgobject <- addCellMetadata( + gobject, + new_metadata = cell_meta[, c("cell_ID", pattern_name), with = FALSE], + by_column = TRUE, + column_cell_ID = "cell_ID" + ) + # show pattern + if (show_pattern == TRUE) { + spatPlot2D( + gobject = newgobject, + save_plot = FALSE, + cell_color_code = pattern_colors, + point_size = 2, + cell_color = pattern_name + ) + } - ## merge cell metadata and cell coordinate data - cell_meta = pDataDT(newgobject) - # cell_coord = newgobject@spatial_locs[['raw']] - cell_coord = get_spatial_locations(newgobject, - spat_loc_name = 'raw', - output = 'data.table', - copy_obj = TRUE) - cell_meta = data.table::merge.data.table(cell_meta, cell_coord, by = 'cell_ID') - ## get number of cells within pattern - cell_number = nrow(cell_meta[get(pattern_name) == 'in']) + ## merge cell metadata and cell coordinate data + cell_meta <- pDataDT(newgobject) + cell_coord <- getSpatialLocations(newgobject, + name = "raw", + output = "data.table", + copy_obj = TRUE + ) + cell_meta <- data.table::merge.data.table( + cell_meta, cell_coord, by = "cell_ID") + ## get number of cells within pattern + cell_number <- nrow(cell_meta[get(pattern_name) == "in"]) - ## normalized expression - expr_data = newgobject@norm_expr - result_list = list() - ## raw expression - raw_expr_data = newgobject@raw_exprs - raw_result_list = list() + ## normalized expression + expr_data <- newgobject@norm_expr + result_list <- list() + ## raw expression + raw_expr_data <- newgobject@raw_exprs + raw_result_list <- list() - ## create the spatial expression pattern for the specified gene - # 1. rank all gene values from the cells from high to low - # 2. move the highest expressing values to the spatial pattern using a probability - # - 0.5 is the control = random - # - 1 is perfection: all the highest values go to the pattern - # - 0.5 to 1 is decreasing noise levels - if(is.null(gene_name)) stop('a gene name needs to be provided') + ## create the spatial expression pattern for the specified gene + # 1. rank all gene values from the cells from high to low + # 2. move the highest expressing values to the spatial pattern using a + # probability + # - 0.5 is the control = random + # - 1 is perfection: all the highest values go to the pattern + # - 0.5 to 1 is decreasing noise levels + if (is.null(gene_name)) stop("a gene name needs to be provided") - # rank genes - gene_vector = expr_data[rownames(expr_data) == gene_name, ] - sort_expr_gene = sort(gene_vector, decreasing = T) - # number of cells in and out the pattern - total_cell_number = length(sort_expr_gene) - remaining_cell_number = total_cell_number - cell_number + # rank genes + gene_vector <- expr_data[rownames(expr_data) == gene_name, ] + sort_expr_gene <- sort(gene_vector, decreasing = TRUE) - # calculate outside probability - outside_prob = 1 - spatial_prob - prob_vector = c(rep(spatial_prob, cell_number), rep(outside_prob, remaining_cell_number)) + # number of cells in and out the pattern + total_cell_number <- length(sort_expr_gene) + remaining_cell_number <- total_cell_number - cell_number - # first get the 'in' pattern sample values randomly - sample_values = sample(sort_expr_gene, replace = F, size = cell_number, prob = prob_vector) + # calculate outside probability + outside_prob <- 1 - spatial_prob + prob_vector <- c( + rep(spatial_prob, cell_number), + rep(outside_prob, remaining_cell_number)) - # then take the remaining 'out' pattern values randomly - remain_values = sort_expr_gene[!names(sort_expr_gene) %in% names(sample_values)] - remain_values = sample(remain_values, size = length(remain_values)) + # first get the 'in' pattern sample values randomly + sample_values <- sample( + sort_expr_gene, replace = FALSE, size = cell_number, prob = prob_vector) + # then take the remaining 'out' pattern values randomly + remain_values <- sort_expr_gene[ + !names(sort_expr_gene) %in% names(sample_values)] + remain_values <- sample(remain_values, size = length(remain_values)) - ## A. within pattern ## - # ------------------- # - in_cell_meta = cell_meta[get(pattern_name) == 'in'] - # if gradient is wanted - # does not work with 0.5!! is not random!! - if(!is.null(gradient_direction)) { - # sort in_ids according to x, y or xy coordinates to create gradient - in_cell_meta[, sdimx_y := abs(sdimx)+ abs(sdimy)] - # order according to gradient direction - in_cell_meta = in_cell_meta[order(get(gradient_direction))] - } - in_ids = in_cell_meta$cell_ID + ## A. within pattern ## + # ------------------- # + in_cell_meta <- cell_meta[get(pattern_name) == "in"] - # preparation for raw matrix - sample_values_id_vector = names(sample_values) - names(sample_values_id_vector) = in_ids + # if gradient is wanted + # does not work with 0.5!! is not random!! + if (!is.null(gradient_direction)) { + # sort in_ids according to x, y or xy coordinates to create gradient + in_cell_meta[, sdimx_y := abs(sdimx) + abs(sdimy)] + # order according to gradient direction + in_cell_meta <- in_cell_meta[order(get(gradient_direction))] + } + in_ids <- in_cell_meta$cell_ID + # preparation for raw matrix + sample_values_id_vector <- names(sample_values) + names(sample_values_id_vector) <- in_ids - ## B. outside pattern ## - # -------------------- # - out_ids = cell_meta[get(pattern_name) == 'out']$cell_ID - # preparation for raw matrix - remain_values_id_vector = names(remain_values) - names(remain_values_id_vector) = out_ids + ## B. outside pattern ## + # -------------------- # + out_ids <- cell_meta[get(pattern_name) == "out"]$cell_ID + # preparation for raw matrix + remain_values_id_vector <- names(remain_values) + names(remain_values_id_vector) <- out_ids - ## raw matrix - # swap the cell ids # - raw_gene_vector = raw_expr_data[rownames(raw_expr_data) == gene_name,] - raw_new_sample_vector = raw_gene_vector[sample_values_id_vector] - names(raw_new_sample_vector) = names(sample_values_id_vector) + ## raw matrix + # swap the cell ids # + raw_gene_vector <- raw_expr_data[rownames(raw_expr_data) == gene_name, ] - raw_new_remain_vector = raw_gene_vector[remain_values_id_vector] - names(raw_new_remain_vector) = names(remain_values_id_vector) + raw_new_sample_vector <- raw_gene_vector[sample_values_id_vector] + names(raw_new_sample_vector) <- names(sample_values_id_vector) - new_sim_raw_values = c(raw_new_sample_vector, raw_new_remain_vector) - new_sim_raw_values = new_sim_raw_values[names(raw_gene_vector)] + raw_new_remain_vector <- raw_gene_vector[remain_values_id_vector] + names(raw_new_remain_vector) <- names(remain_values_id_vector) - # change the original matrices - raw_expr_data[rownames(raw_expr_data) == gene_name,] = new_sim_raw_values - newgobject@raw_exprs = raw_expr_data + new_sim_raw_values <- c(raw_new_sample_vector, raw_new_remain_vector) + new_sim_raw_values <- new_sim_raw_values[names(raw_gene_vector)] - # recalculate normalized values - newgobject <- normalizeGiotto(gobject = newgobject, ...) - newgobject <- addStatistics(gobject = newgobject) + # change the original matrices + raw_expr_data[rownames(raw_expr_data) == gene_name, ] <- new_sim_raw_values + newgobject@raw_exprs <- raw_expr_data - return(newgobject) + # recalculate normalized values + newgobject <- normalizeGiotto(gobject = newgobject, ...) + newgobject <- addStatistics(gobject = newgobject) + return(newgobject) } @@ -4353,274 +4664,296 @@ simulateOneGenePatternGiottoObject = function(gobject, #' @name run_spatial_sim_tests_one_rep #' @description runs all spatial tests for 1 probability and 1 rep #' @keywords internal -run_spatial_sim_tests_one_rep = function(gobject, - pattern_name = 'pattern', - pattern_cell_ids = NULL, - gene_name = NULL, - spatial_prob = 0.95, - show_pattern = FALSE, - spatial_network_name = 'kNN_network', - spat_methods = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - spat_methods_params = list(NA, NA, NA, NA, NA), - spat_methods_names = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - save_plot = FALSE, - save_raw = FALSE, - save_norm = FALSE, - save_dir = '~', - save_name = 'plot', - run_simulations = TRUE, - ...) { - - - # data.table variables - genes = prob = time = adj.p.value = method = p.val = sd = qval = pval = g = adjusted_pvalue = NULL - - ## test if spat_methods, params and names have the same length - if(length(spat_methods) != length(spat_methods_params)) { - stop('number of spatial detection methods to test need to be equal to number of spatial methods parameters \n') - } - if(length(spat_methods) != length(spat_methods_names)) { - stop('number of spatial detection methods to test need to be equal to number of spatial methods names \n') - } - - - ## simulate pattern ## - simulate_patch = simulateOneGenePatternGiottoObject(gobject, - pattern_name = pattern_name, - pattern_cell_ids = pattern_cell_ids, - gene_name = gene_name, - spatial_prob = spatial_prob, - gradient_direction = NULL, - show_pattern = show_pattern, - ...) - - # save plot - if(save_plot == TRUE) { - - spatFeatPlot2D(simulate_patch, expression_values = 'norm', feats = gene_name, - point_shape = 'border', point_border_stroke = 0.1, point_size = 2.5, - cow_n_col = 1, show_plot = F, - save_plot = T, save_param = list(save_dir = save_dir, save_folder = pattern_name, - save_name = save_name, - base_width = 9, base_height = 7, units = 'cm')) - - } - - # save raw data - if(save_raw == TRUE) { - - folder_path = paste0(save_dir, '/', pattern_name) - if(!file.exists(folder_path)) dir.create(folder_path, recursive = TRUE) - - write.table(x = as.matrix(simulate_patch@raw_exprs), - file = paste0(save_dir, '/', pattern_name,'/', save_name, '_raw_data.txt'), - sep = '\t') - } - - # save normalized data - if(save_norm == TRUE) { - - folder_path = paste0(save_dir, '/', pattern_name) - if(!file.exists(folder_path)) dir.create(folder_path, recursive = TRUE) - - write.table(x = as.matrix(simulate_patch@norm_expr), - file = paste0(save_dir, '/', pattern_name,'/', save_name, '_norm_data.txt'), - sep = '\t') - } - - - - ## do simulations ## - if(run_simulations == TRUE) { - - result_list = list() - for(test in seq_along(spat_methods)) { - - # method - selected_method = spat_methods[test] - if(!selected_method %in% c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank')) { - stop(selected_method, ' is not a know spatial method \n') - } - - # params - selected_params = spat_methods_params[[test]] - - if(length(selected_params) == 1) { - - if(is.na(selected_params)) { - - if(selected_method == 'binSpect_single') { - selected_params = list(bin_method = 'kmeans', - nstart = 3, - iter_max = 10, - expression_values = 'normalized', - get_av_expr = FALSE, - get_high_expr = FALSE) - - } else if(selected_method == 'binSpect_multi') { - selected_params = list(bin_method = 'kmeans', - spatial_network_k = c(5, 10, 20), - nstart = 3, - iter_max = 10, - expression_values = 'normalized', - get_av_expr = FALSE, - get_high_expr = FALSE, - summarize = 'adj.p.value') - - } else if(selected_method == 'spatialDE') { - selected_params = list(expression_values = 'raw', - sig_alpha = 0.5, - unsig_alpha = 0.5, - show_plot = FALSE, - return_plot = FALSE, - save_plot = FALSE) - - - } else if(selected_method == 'spark') { - selected_params = list(expression_values = 'raw', - return_object = 'data.table', - percentage = 0.1, - min_count = 10, - num_core = 5) - - } else if(selected_method == 'silhouetteRank') { - selected_params = list(expression_values = 'normalized', - overwrite_input_bin = FALSE, - rbp_ps = c(0.95, 0.99), - examine_tops = c(0.005, 0.010), - matrix_type = "dissim", - num_core = 4, - parallel_path = "/usr/bin", - output = NULL, - query_sizes = 10L) - - } - - } - - } - - # name - selected_name = spat_methods_names[test] - - - ## RUN Spatial Analysis ## - if(selected_method == 'binSpect_single') { - - start = proc.time() - spatial_gene_results = do.call('binSpectSingle', c(gobject = simulate_patch, - selected_params)) - - spatial_gene_results = spatial_gene_results[genes == gene_name] - total_time = proc.time() - start - - spatial_gene_results[, prob := spatial_prob] - spatial_gene_results[, time := total_time[['elapsed']] ] - - spatial_gene_results = spatial_gene_results[,.(genes, adj.p.value, prob, time)] - colnames(spatial_gene_results) = c('genes', 'adj.p.value', 'prob', 'time') - - spatial_gene_results[, method := selected_name] - - - } else if(selected_method == 'binSpect_multi') { - - start = proc.time() - spatial_gene_results = do.call('binSpectMulti', c(gobject = simulate_patch, - selected_params)) - - spatial_gene_results = spatial_gene_results$simple - spatial_gene_results = spatial_gene_results[genes == gene_name] - total_time = proc.time() - start - - spatial_gene_results[, prob := spatial_prob] - spatial_gene_results[, time := total_time[['elapsed']] ] - - spatial_gene_results = spatial_gene_results[,.(genes, p.val, prob, time)] - colnames(spatial_gene_results) = c('genes', 'adj.p.value', 'prob', 'time') - - spatial_gene_results[, method := selected_name] - - - } else if(selected_method == 'spatialDE') { - - start = proc.time() - new_raw_sim_matrix = simulate_patch@raw_exprs - sd_cells = apply(new_raw_sim_matrix, 2, sd) - sd_non_zero_cells = names(sd_cells[sd_cells != 0]) - simulate_patch_fix = subsetGiotto(simulate_patch, cell_ids = sd_non_zero_cells) - - spatial_gene_results = do.call('spatialDE', c(gobject = simulate_patch_fix, - selected_params)) - - spatialDE_spatialgenes_sim_res = spatial_gene_results$results$results - if(is.null(spatialDE_spatialgenes_sim_res)) spatialDE_spatialgenes_sim_res = spatial_gene_results$results - - spatialDE_spatialgenes_sim_res = data.table::as.data.table(spatialDE_spatialgenes_sim_res) - data.table::setorder(spatialDE_spatialgenes_sim_res, qval, pval) - spatialDE_result = spatialDE_spatialgenes_sim_res[g == gene_name] - - spatialDE_time = proc.time() - start - - spatialDE_result[, prob := spatial_prob] - spatialDE_result[, time := spatialDE_time[['elapsed']] ] - - spatial_gene_results = spatialDE_result[,.(g, qval, prob, time)] - colnames(spatial_gene_results) = c('genes', 'adj.p.value', 'prob', 'time') - spatial_gene_results[, method := 'spatialDE'] - - - } else if(selected_method == 'spark') { - - ## spark - start = proc.time() - spark_spatialgenes_sim = do.call('spark', c(gobject = simulate_patch, - selected_params)) - - spark_result = spark_spatialgenes_sim[genes == gene_name] - spark_time = proc.time() - start - - spark_result[, prob := spatial_prob] - spark_result[, time := spark_time[['elapsed']] ] - - spatial_gene_results = spark_result[,.(genes, adjusted_pvalue, prob, time)] - colnames(spatial_gene_results) = c('genes', 'adj.p.value', 'prob', 'time') - spatial_gene_results[, method := 'spark'] +run_spatial_sim_tests_one_rep <- function(gobject, + pattern_name = "pattern", + pattern_cell_ids = NULL, + gene_name = NULL, + spatial_prob = 0.95, + show_pattern = FALSE, + spatial_network_name = "kNN_network", + spat_methods = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + spat_methods_params = list(NA, NA, NA, NA, NA), + spat_methods_names = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + save_plot = FALSE, + save_raw = FALSE, + save_norm = FALSE, + save_dir = "~", + save_name = "plot", + run_simulations = TRUE, + ...) { + # data.table variables + genes <- prob <- time <- adj.p.value <- method <- p.val <- sd <- + qval <- pval <- g <- adjusted_pvalue <- NULL + ## test if spat_methods, params and names have the same length + if (length(spat_methods) != length(spat_methods_params)) { + stop("number of spatial detection methods to test need to be equal + to number of spatial methods parameters") + } + if (length(spat_methods) != length(spat_methods_names)) { + stop("number of spatial detection methods to test need to be equal + to number of spatial methods names") + } - } else if(selected_method == 'silhouetteRank') { - ## silhouetterank - start = proc.time() + ## simulate pattern ## + simulate_patch <- simulateOneGenePatternGiottoObject(gobject, + pattern_name = pattern_name, + pattern_cell_ids = pattern_cell_ids, + gene_name = gene_name, + spatial_prob = spatial_prob, + gradient_direction = NULL, + show_pattern = show_pattern, + ... + ) - spatial_gene_results = do.call('silhouetteRankTest', c(gobject = simulate_patch, - selected_params)) + # save plot + if (save_plot == TRUE) { + spatFeatPlot2D(simulate_patch, + expression_values = "norm", + feats = gene_name, + point_shape = "border", + point_border_stroke = 0.1, + point_size = 2.5, + cow_n_col = 1, show_plot = FALSE, + save_plot = TRUE, + save_param = list( + save_dir = save_dir, save_folder = pattern_name, + save_name = save_name, + base_width = 9, base_height = 7, units = "cm" + ) + ) + } - data.table::setnames(spatial_gene_results, old = 'gene', new = 'genes') - spatial_gene_results = spatial_gene_results[genes == gene_name] - silh_time = proc.time() - start + # save raw data + if (save_raw == TRUE) { + folder_path <- paste0(save_dir, "/", pattern_name) + if (!file.exists(folder_path)) dir.create(folder_path, recursive = TRUE) + + write.table( + x = as.matrix(simulate_patch@raw_exprs), + file = paste0( + save_dir, "/", pattern_name, "/", save_name, "_raw_data.txt"), + sep = "\t" + ) + } - spatial_gene_results[, prob := spatial_prob] - spatial_gene_results[, time := silh_time[['elapsed']] ] + # save normalized data + if (save_norm == TRUE) { + folder_path <- paste0(save_dir, "/", pattern_name) + if (!file.exists(folder_path)) dir.create(folder_path, recursive = TRUE) + + write.table( + x = as.matrix(simulate_patch@norm_expr), + file = paste0( + save_dir, "/", pattern_name, "/", save_name, "_norm_data.txt"), + sep = "\t" + ) + } - # silhrank uses qval by default - spatial_gene_results = spatial_gene_results[,.(genes, qval, prob, time)] - colnames(spatial_gene_results) = c('genes', 'adj.p.value', 'prob', 'time') - spatial_gene_results[, method := 'silhouette'] - } - result_list[[test]] = spatial_gene_results + ## do simulations ## + if (run_simulations == TRUE) { + result_list <- list() + for (test in seq_along(spat_methods)) { + # method + selected_method <- spat_methods[test] + if (!selected_method %in% + c("binSpect_single", "binSpect_multi", "spatialDE", "spark", + "silhouetteRank")) { + stop(selected_method, " is not a know spatial method") + } + + # params + selected_params <- spat_methods_params[[test]] + + if (length(selected_params) == 1) { + if (is.na(selected_params)) { + if (selected_method == "binSpect_single") { + selected_params <- list( + bin_method = "kmeans", + nstart = 3, + iter_max = 10, + expression_values = "normalized", + get_av_expr = FALSE, + get_high_expr = FALSE + ) + } else if (selected_method == "binSpect_multi") { + selected_params <- list( + bin_method = "kmeans", + spatial_network_k = c(5, 10, 20), + nstart = 3, + iter_max = 10, + expression_values = "normalized", + get_av_expr = FALSE, + get_high_expr = FALSE, + summarize = "adj.p.value" + ) + } else if (selected_method == "spatialDE") { + selected_params <- list( + expression_values = "raw", + sig_alpha = 0.5, + unsig_alpha = 0.5, + show_plot = FALSE, + return_plot = FALSE, + save_plot = FALSE + ) + } else if (selected_method == "spark") { + selected_params <- list( + expression_values = "raw", + return_object = "data.table", + percentage = 0.1, + min_count = 10, + num_core = 5 + ) + } else if (selected_method == "silhouetteRank") { + selected_params <- list( + expression_values = "normalized", + overwrite_input_bin = FALSE, + rbp_ps = c(0.95, 0.99), + examine_tops = c(0.005, 0.010), + matrix_type = "dissim", + num_core = 4, + parallel_path = "/usr/bin", + output = NULL, + query_sizes = 10L + ) + } + } + } + + # name + selected_name <- spat_methods_names[test] + + + ## RUN Spatial Analysis ## + if (selected_method == "binSpect_single") { + start <- proc.time() + spatial_gene_results <- do.call("binSpectSingle", c( + gobject = simulate_patch, + selected_params + )) + + spatial_gene_results <- spatial_gene_results[genes == gene_name] + total_time <- proc.time() - start + + spatial_gene_results[, prob := spatial_prob] + spatial_gene_results[, time := total_time[["elapsed"]]] + + spatial_gene_results <- spatial_gene_results[ + , .(genes, adj.p.value, prob, time)] + colnames(spatial_gene_results) <- c( + "genes", "adj.p.value", "prob", "time") + + spatial_gene_results[, method := selected_name] + } else if (selected_method == "binSpect_multi") { + start <- proc.time() + spatial_gene_results <- do.call("binSpectMulti", c( + gobject = simulate_patch, + selected_params + )) + + spatial_gene_results <- spatial_gene_results$simple + spatial_gene_results <- spatial_gene_results[genes == gene_name] + total_time <- proc.time() - start + + spatial_gene_results[, prob := spatial_prob] + spatial_gene_results[, time := total_time[["elapsed"]]] + + spatial_gene_results <- spatial_gene_results[ + , .(genes, p.val, prob, time)] + colnames(spatial_gene_results) <- c( + "genes", "adj.p.value", "prob", "time") + + spatial_gene_results[, method := selected_name] + } else if (selected_method == "spatialDE") { + start <- proc.time() + new_raw_sim_matrix <- simulate_patch@raw_exprs + sd_cells <- apply(new_raw_sim_matrix, 2, sd) + sd_non_zero_cells <- names(sd_cells[sd_cells != 0]) + simulate_patch_fix <- subsetGiotto( + simulate_patch, cell_ids = sd_non_zero_cells) + + spatial_gene_results <- do.call("spatialDE", c( + gobject = simulate_patch_fix, + selected_params + )) + + spatialDE_spatialgenes_sim_res <- spatial_gene_results$results$results + if (is.null(spatialDE_spatialgenes_sim_res)) + spatialDE_spatialgenes_sim_res <- spatial_gene_results$results + + spatialDE_spatialgenes_sim_res <- data.table::as.data.table( + spatialDE_spatialgenes_sim_res) + data.table::setorder(spatialDE_spatialgenes_sim_res, qval, pval) + spatialDE_result <- spatialDE_spatialgenes_sim_res[ + g == gene_name] + + spatialDE_time <- proc.time() - start + + spatialDE_result[, prob := spatial_prob] + spatialDE_result[, time := spatialDE_time[["elapsed"]]] + + spatial_gene_results <- spatialDE_result[ + , .(g, qval, prob, time)] + colnames(spatial_gene_results) <- c( + "genes", "adj.p.value", "prob", "time") + spatial_gene_results[, method := "spatialDE"] + } else if (selected_method == "spark") { + ## spark + start <- proc.time() + spark_spatialgenes_sim <- do.call("spark", c( + gobject = simulate_patch, + selected_params + )) + + spark_result <- spark_spatialgenes_sim[genes == gene_name] + spark_time <- proc.time() - start + + spark_result[, prob := spatial_prob] + spark_result[, time := spark_time[["elapsed"]]] + + spatial_gene_results <- spark_result[ + , .(genes, adjusted_pvalue, prob, time)] + colnames(spatial_gene_results) <- c( + "genes", "adj.p.value", "prob", "time") + spatial_gene_results[, method := "spark"] + } else if (selected_method == "silhouetteRank") { + ## silhouetterank + start <- proc.time() + + spatial_gene_results <- do.call("silhouetteRankTest", c( + gobject = simulate_patch, + selected_params + )) + + data.table::setnames( + spatial_gene_results, old = "gene", new = "genes") + spatial_gene_results <- spatial_gene_results[genes == gene_name] + silh_time <- proc.time() - start + + spatial_gene_results[, prob := spatial_prob] + spatial_gene_results[, time := silh_time[["elapsed"]]] + + # silhrank uses qval by default + spatial_gene_results <- spatial_gene_results[ + , .(genes, qval, prob, time)] + colnames(spatial_gene_results) <- c( + "genes", "adj.p.value", "prob", "time") + spatial_gene_results[, method := "silhouette"] + } + + result_list[[test]] <- spatial_gene_results + } + results <- data.table::rbindlist(l = result_list) + return(results) + } else { + return(NULL) } - - results = data.table::rbindlist(l = result_list) - return(results) - - } else { - return(NULL) - } - } @@ -4629,89 +4962,77 @@ run_spatial_sim_tests_one_rep = function(gobject, #' @title run_spatial_sim_tests_multi #' @name run_spatial_sim_tests_multi -#' @description runs all spatial tests for multiple probabilities and repetitions +#' @description runs all spatial tests for multiple probabilities and +#' repetitions #' @keywords internal -run_spatial_sim_tests_multi = function(gobject, - pattern_name = 'pattern', - pattern_cell_ids = NULL, - gene_name = NULL, - spatial_probs = c(0.5, 1), - reps = 2, - - spatial_network_name = 'kNN_network', - spat_methods = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - spat_methods_params = list(NA, NA, NA, NA, NA), - spat_methods_names = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - - save_plot = FALSE, - save_raw = FALSE, - save_norm = FALSE, - save_dir = '~', - verbose = TRUE, - run_simulations = TRUE, - ... ) { - - - prob_list = list() - for(prob_ind in seq_along(spatial_probs)) { - - prob_i = spatial_probs[prob_ind] - - if(verbose) cat('\n \n start with ', prob_i, '\n \n') - - rep_list = list() - for(rep_i in 1:reps) { - - - if(verbose) cat('\n \n repetitiion = ', rep_i, '\n \n') - - - plot_name = paste0('plot_',gene_name,'_prob', prob_i, '_rep', rep_i) - - - rep_res = run_spatial_sim_tests_one_rep(gobject, - pattern_name = pattern_name, - pattern_cell_ids = pattern_cell_ids, - gene_name = gene_name, - spatial_prob = prob_i, - - spatial_network_name = spatial_network_name, - - spat_methods = spat_methods, - spat_methods_params = spat_methods_params, - spat_methods_names = spat_methods_names, - - save_plot = save_plot, - save_raw = save_raw, - save_norm = save_norm, - - save_dir = save_dir, - save_name = plot_name, - run_simulations = run_simulations, - ...) - - if(run_simulations == TRUE) { - rep_res[, rep := rep_i] - rep_list[[rep_i]] = rep_res - } - +run_spatial_sim_tests_multi <- function(gobject, + pattern_name = "pattern", + pattern_cell_ids = NULL, + gene_name = NULL, + spatial_probs = c(0.5, 1), + reps = 2, + spatial_network_name = "kNN_network", + spat_methods = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + spat_methods_params = list(NA, NA, NA, NA, NA), + spat_methods_names = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + save_plot = FALSE, + save_raw = FALSE, + save_norm = FALSE, + save_dir = "~", + verbose = TRUE, + run_simulations = TRUE, + ...) { + prob_list <- list() + for (prob_ind in seq_along(spatial_probs)) { + prob_i <- spatial_probs[prob_ind] + + if (verbose) cat("start with ", prob_i) + + rep_list <- list() + for (rep_i in 1:reps) { + if (verbose) cat("repetitiion = ", rep_i) + + + plot_name <- paste0("plot_", gene_name, "_prob", + prob_i, "_rep", rep_i) + + + rep_res <- run_spatial_sim_tests_one_rep(gobject, + pattern_name = pattern_name, + pattern_cell_ids = pattern_cell_ids, + gene_name = gene_name, + spatial_prob = prob_i, + spatial_network_name = spatial_network_name, + spat_methods = spat_methods, + spat_methods_params = spat_methods_params, + spat_methods_names = spat_methods_names, + save_plot = save_plot, + save_raw = save_raw, + save_norm = save_norm, + save_dir = save_dir, + save_name = plot_name, + run_simulations = run_simulations, + ... + ) + + if (run_simulations == TRUE) { + rep_res[, rep := rep_i] + rep_list[[rep_i]] <- rep_res + } + } + if (run_simulations == TRUE) { + rep_list_res <- do.call("rbind", rep_list) + prob_list[[prob_ind]] <- rep_list_res + } } - if(run_simulations == TRUE) { - rep_list_res = do.call('rbind', rep_list) - prob_list[[prob_ind]] = rep_list_res + if (run_simulations == TRUE) { + final_gene_results <- do.call("rbind", prob_list) + return(final_gene_results) } - - - } - - if(run_simulations == TRUE) { - final_gene_results = do.call('rbind', prob_list) - return(final_gene_results) - } - - } @@ -4719,18 +5040,22 @@ run_spatial_sim_tests_multi = function(gobject, #' @title runPatternSimulation #' @name runPatternSimulation -#' @description Creates a known spatial pattern for selected genes one-by-one and runs the different spatial gene detection tests +#' @description Creates a known spatial pattern for selected genes one-by-one +#' and runs the different spatial gene detection tests #' @param gobject giotto object #' @param pattern_name name of spatial pattern #' @param pattern_colors 2 color vector for the spatial pattern #' @param pattern_cell_ids cell ids that make up the spatial pattern #' @param gene_names selected genes -#' @param spatial_probs probabilities to test for a high expressing gene value to be part of the spatial pattern +#' @param spatial_probs probabilities to test for a high expressing gene +#' value to be part of the spatial pattern #' @param reps number of random simulation repetitions #' @param spatial_network_name which spatial network to use for binSpectSingle #' @param spat_methods vector of spatial methods to test -#' @param spat_methods_params list of parameters list for each element in the vector of spatial methods to test -#' @param spat_methods_names name for each element in the vector of spatial elements to test +#' @param spat_methods_params list of parameters list for each element in the +#' vector of spatial methods to test +#' @param spat_methods_names name for each element in the vector of spatial +#' elements to test #' @param scalefactor library size scaling factor when re-normalizing dataset #' @param save_plot save intermediate random simulation plots or not #' @param save_raw save the raw expression matrix of the simulation @@ -4743,167 +5068,205 @@ run_spatial_sim_tests_multi = function(gobject, #' @param \dots additional parameters for renormalization #' @return data.table with results #' @export -runPatternSimulation = function(gobject, - pattern_name = 'pattern', - pattern_colors = c('in' = 'green', 'out' = 'red'), - pattern_cell_ids = NULL, - gene_names = NULL, - spatial_probs = c(0.5, 1), - reps = 2, - spatial_network_name = 'kNN_network', - spat_methods = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - spat_methods_params = list(NA, NA, NA, NA, NA), - spat_methods_names = c('binSpect_single', 'binSpect_multi', 'spatialDE', 'spark', 'silhouetteRank'), - scalefactor = 6000, - save_plot = T, - save_raw = T, - save_norm = T, - save_dir = '~', - max_col = 4, - height = 7, - width = 7, - run_simulations = TRUE, - ...) { - - - # data.table variables - prob = method = adj.p.value = time = NULL - - - # plot pattern for first gene (the same for all) - example_patch = simulateOneGenePatternGiottoObject(gobject, - pattern_name = pattern_name, - pattern_cell_ids = pattern_cell_ids, - gene_name = gene_names[[1]], - spatial_prob = 1, - scalefactor = scalefactor, - verbose = T) - - spatPlot2D(example_patch, cell_color = pattern_name, cell_color_code = pattern_colors, - save_plot = save_plot, save_param = list(save_dir = save_dir, save_folder = 'original', save_name = paste0(pattern_name,'_pattern'), - base_width = 9, base_height = 7, units = 'cm')) - - - all_results = list() - for(gene_ind in seq_along(gene_names)) { - - gene = gene_names[gene_ind] - - # plot original expression - GiottoVisuals::spatFeatPlot2D( - gobject = gobject, - expression_values = 'norm', - feats = gene, - point_shape = 'border', point_border_stroke = 0.1, - show_network = F, network_color = 'lightgrey', - point_size = 2.5, - cow_n_col = 1, - show_plot = FALSE, - save_plot = save_plot, - save_param = list(save_dir = save_dir, save_folder = 'original', - save_name = paste0(gene,'_original'), - base_width = 9, base_height = 7, - units = 'cm') - ) +runPatternSimulation <- function(gobject, + pattern_name = "pattern", + pattern_colors = c("in" = "green", "out" = "red"), + pattern_cell_ids = NULL, + gene_names = NULL, + spatial_probs = c(0.5, 1), + reps = 2, + spatial_network_name = "kNN_network", + spat_methods = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + spat_methods_params = list(NA, NA, NA, NA, NA), + spat_methods_names = c("binSpect_single", "binSpect_multi", "spatialDE", + "spark", "silhouetteRank"), + scalefactor = 6000, + save_plot = TRUE, + save_raw = TRUE, + save_norm = TRUE, + save_dir = "~", + max_col = 4, + height = 7, + width = 7, + run_simulations = TRUE, + ...) { + # data.table variables + prob <- method <- adj.p.value <- time <- NULL - generesults = run_spatial_sim_tests_multi( - gobject, - pattern_name = pattern_name, - pattern_cell_ids = pattern_cell_ids, - gene_name = gene, - spatial_network_name = spatial_network_name, - spat_methods = spat_methods, - spat_methods_params = spat_methods_params, - spat_methods_names = spat_methods_names, - save_plot = save_plot, - save_raw = save_raw, - save_norm = save_norm, - save_dir = save_dir, - spatial_probs = spatial_probs, - reps = reps, - run_simulations = run_simulations, - ... + # plot pattern for first gene (the same for all) + example_patch <- simulateOneGenePatternGiottoObject(gobject, + pattern_name = pattern_name, + pattern_cell_ids = pattern_cell_ids, + gene_name = gene_names[[1]], + spatial_prob = 1, + scalefactor = scalefactor, + verbose = TRUE ) - if(run_simulations == TRUE) { - generesults[, prob := as.factor(prob)] - uniq_methods = mixedsort(unique(generesults$method)) - generesults[, method := factor(method, levels = uniq_methods)] - - if(save_plot == TRUE) { - - subdir = paste0(save_dir,'/',pattern_name,'/') - if(!file.exists(subdir)) dir.create(path = subdir, recursive = TRUE) - # write results - data.table::fwrite(x = generesults, file = paste0(subdir,'/',gene,'_results.txt'), sep = '\t', quote = F) - - } + spatPlot2D(example_patch, + cell_color = pattern_name, cell_color_code = pattern_colors, + save_plot = save_plot, save_param = list( + save_dir = save_dir, + save_folder = "original", + save_name = paste0(pattern_name, "_pattern"), + base_width = 9, + base_height = 7, + units = "cm" + ) + ) - all_results[[gene_ind]] = generesults + all_results <- list() + for (gene_ind in seq_along(gene_names)) { + gene <- gene_names[gene_ind] + + # plot original expression + GiottoVisuals::spatFeatPlot2D( + gobject = gobject, + expression_values = "norm", + feats = gene, + point_shape = "border", + point_border_stroke = 0.1, + show_network = FALSE, + network_color = "lightgrey", + point_size = 2.5, + cow_n_col = 1, + show_plot = FALSE, + save_plot = save_plot, + save_param = list( + save_dir = save_dir, save_folder = "original", + save_name = paste0(gene, "_original"), + base_width = 9, base_height = 7, + units = "cm" + ) + ) + + + generesults <- run_spatial_sim_tests_multi( + gobject, + pattern_name = pattern_name, + pattern_cell_ids = pattern_cell_ids, + gene_name = gene, + spatial_network_name = spatial_network_name, + spat_methods = spat_methods, + spat_methods_params = spat_methods_params, + spat_methods_names = spat_methods_names, + save_plot = save_plot, + save_raw = save_raw, + save_norm = save_norm, + save_dir = save_dir, + spatial_probs = spatial_probs, + reps = reps, + run_simulations = run_simulations, + ... + ) + + if (run_simulations == TRUE) { + generesults[, prob := as.factor(prob)] + uniq_methods <- mixedsort(unique(generesults$method)) + generesults[, method := factor(method, levels = uniq_methods)] + + if (save_plot == TRUE) { + subdir <- paste0(save_dir, "/", pattern_name, "/") + if (!file.exists(subdir)) dir.create( + path = subdir, recursive = TRUE) + # write results + data.table::fwrite( + x = generesults, + file = paste0(subdir, "/", gene, "_results.txt"), + sep = "\t", quote = FALSE) + } + + all_results[[gene_ind]] <- generesults + } } - } - - - ## create combined results and visuals - if(run_simulations == TRUE) { - - results = do.call('rbind', all_results) - - ## plot results ## - - if(save_plot == TRUE) { - # 4 columns max - nr_rows = max(c(round(length(gene_names)/max_col), 1)) - - # p-values - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_boxplot(data = results, ggplot2::aes(x = method, y = adj.p.value, color = prob)) - pl = pl + ggplot2::geom_point(data = results, ggplot2::aes(x = method, y = adj.p.value, color = prob), size = 2, position = ggplot2::position_jitterdodge()) - pl = pl + ggplot2::theme_bw() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 1, hjust = 1)) - pl = pl + ggplot2::facet_wrap(~genes, nrow = nr_rows) - pl = pl + ggplot2::geom_hline(yintercept = 0.05, color = 'red', linetype = 2) - - grDevices::pdf(file = paste0(save_dir,'/',pattern_name,'_boxplot_pvalues.pdf'), width = width, height = height) - print(pl) - grDevices::dev.off() - - - - # -log10 p-values - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_boxplot(data = results, ggplot2::aes(x = method, y = -log10(adj.p.value), color = prob)) - pl = pl + ggplot2::geom_point(data = results, ggplot2::aes(x = method, y = -log10(adj.p.value), color = prob), size = 2, position = ggplot2::position_jitterdodge()) - pl = pl + ggplot2::theme_bw() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 1, hjust = 1)) - pl = pl + ggplot2::facet_wrap(~genes, nrow = nr_rows) - - grDevices::pdf(file = paste0(save_dir,'/',pattern_name,'_boxplot_log10pvalues.pdf'), width = width, height = height) - print(pl) - grDevices::dev.off() + ## create combined results and visuals + if (run_simulations == TRUE) { + results <- do.call("rbind", all_results) + + ## plot results ## + + if (save_plot == TRUE) { + # 4 columns max + nr_rows <- max(c(round(length(gene_names) / max_col), 1)) + + # p-values + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_boxplot( + data = results, + ggplot2::aes(x = method, y = adj.p.value, color = prob)) + pl <- pl + ggplot2::geom_point( + data = results, + ggplot2::aes(x = method, y = adj.p.value, color = prob), + size = 2, position = ggplot2::position_jitterdodge()) + pl <- pl + ggplot2::theme_bw() + + ggplot2::theme(axis.text.x = ggplot2::element_text( + angle = 90, vjust = 1, hjust = 1)) + pl <- pl + ggplot2::facet_wrap(~genes, nrow = nr_rows) + pl <- pl + ggplot2::geom_hline( + yintercept = 0.05, color = "red", linetype = 2) + + grDevices::pdf(file = paste0( + save_dir, "/", pattern_name, "_boxplot_pvalues.pdf"), + width = width, height = height) + print(pl) + grDevices::dev.off() + + + + # -log10 p-values + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_boxplot( + data = results, + ggplot2::aes(x = method, y = -log10(adj.p.value), color = prob)) + pl <- pl + ggplot2::geom_point( + data = results, + ggplot2::aes(x = method, y = -log10(adj.p.value), color = prob), + size = 2, position = ggplot2::position_jitterdodge()) + pl <- pl + ggplot2::theme_bw() + ggplot2::theme( + axis.text.x = ggplot2::element_text( + angle = 90, vjust = 1, hjust = 1)) + pl <- pl + ggplot2::facet_wrap(~genes, nrow = nr_rows) + + grDevices::pdf(file = paste0( + save_dir, "/", pattern_name, "_boxplot_log10pvalues.pdf"), + width = width, height = height) + print(pl) + grDevices::dev.off() + + + # time + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_boxplot( + data = results, + ggplot2::aes(x = method, y = time, color = prob)) + pl <- pl + ggplot2::geom_point( + data = results, + ggplot2::aes(x = method, y = time, color = prob), size = 2, + position = ggplot2::position_jitterdodge()) + pl <- pl + ggplot2::theme_bw() + ggplot2::theme( + axis.text.x = ggplot2::element_text( + angle = 90, vjust = 1, hjust = 1)) + + grDevices::pdf(file = paste0( + save_dir, "/", pattern_name, "_boxplot_time.pdf"), + width = width, height = height) + print(pl) + grDevices::dev.off() + } - # time - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_boxplot(data = results, ggplot2::aes(x = method, y = time, color = prob)) - pl = pl + ggplot2::geom_point(data = results, ggplot2::aes(x = method, y = time, color = prob), size = 2, position = ggplot2::position_jitterdodge()) - pl = pl + ggplot2::theme_bw() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 1, hjust = 1)) - grDevices::pdf(file = paste0(save_dir,'/',pattern_name,'_boxplot_time.pdf'), width = width, height = height) - print(pl) - grDevices::dev.off() + # write results + data.table::fwrite( + x = results, + file = paste0(save_dir, "/", pattern_name, "_results.txt"), + sep = "\t", quote = FALSE) + return(results) + } else { + return(NULL) } - - - # write results - data.table::fwrite(x = results, file = paste0(save_dir,'/',pattern_name,'_results.txt'), sep = '\t', quote = F) - return(results) - - } else { - return(NULL) - } - } - - diff --git a/R/spatial_interaction.R b/R/spatial_interaction.R index 843223006..045a77059 100644 --- a/R/spatial_interaction.R +++ b/R/spatial_interaction.R @@ -1,84 +1,90 @@ - - - # cell type proximity enrichment #### #' @title make_simulated_network #' @name make_simulated_network #' @description Simulate random network. #' @keywords internal -make_simulated_network = function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network_name = 'Delaunay_network', - cluster_column, - number_of_simulations = 100, - set_seed = TRUE, - seed_number = 1234) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # data.table variables - unified_cells = NULL - - spatial_network_annot = annotateSpatialNetwork(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - # remove double edges between same cells # - spatial_network_annot = dt_sort_combine_two_columns(spatial_network_annot, - column1 = 'from', column2 = 'to', - myname = 'unified_cells') - spatial_network_annot = spatial_network_annot[!duplicated(unified_cells)] - - # create a simulated network - length_ints = nrow(spatial_network_annot) - s1_list = list() - s2_list = list() - - all_cell_type = c(spatial_network_annot$from_cell_type, spatial_network_annot$to_cell_type) - middle_point = length(all_cell_type)/2 - - for(sim in 1:number_of_simulations) { - - if(set_seed == TRUE) { - seed_number = seed_number+sim - set.seed(seed = seed_number) - } +make_simulated_network <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + spatial_network_name = "Delaunay_network", + cluster_column, + number_of_simulations = 100, + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - reshuffled_all_cell_type = sample(x = all_cell_type, size = length(all_cell_type), replace = F) + # data.table variables + unified_cells <- NULL - new_from_cell_type = reshuffled_all_cell_type[1:middle_point] - s1_list[[sim]] = new_from_cell_type + spatial_network_annot <- annotateSpatialNetwork( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) + + # remove double edges between same cells # + spatial_network_annot <- dt_sort_combine_two_columns(spatial_network_annot, + column1 = "from", column2 = "to", + myname = "unified_cells" + ) + spatial_network_annot <- spatial_network_annot[!duplicated(unified_cells)] - new_to_cell_type = reshuffled_all_cell_type[(middle_point+1):length(all_cell_type)] - s2_list[[sim]] = new_to_cell_type + # create a simulated network + length_ints <- nrow(spatial_network_annot) + s1_list <- list() + s2_list <- list() - } + all_cell_type <- c(spatial_network_annot$from_cell_type, + spatial_network_annot$to_cell_type) + middle_point <- length(all_cell_type) / 2 - s1_vector = do.call('c', s1_list) - s2_vector = do.call('c', s2_list) - round_vector = rep(x = 1:number_of_simulations, each = length_ints) - round_vector = paste0('sim',round_vector) + for (sim in 1:number_of_simulations) { + if (set_seed == TRUE) { + seed_number <- seed_number + sim + set.seed(seed = seed_number) + } - # data.table variables - s1 = s2 = unified_int = type_int = NULL + reshuffled_all_cell_type <- sample( + x = all_cell_type, size = length(all_cell_type), replace = FALSE) + + new_from_cell_type <- reshuffled_all_cell_type[1:middle_point] + s1_list[[sim]] <- new_from_cell_type + + new_to_cell_type <- reshuffled_all_cell_type[ + (middle_point + 1):length(all_cell_type)] + s2_list[[sim]] <- new_to_cell_type + } - sample_dt = data.table::data.table(s1 = s1_vector, s2 = s2_vector, round = round_vector) - uniq_sim_comb = unique(sample_dt[,.(s1,s2)]) - uniq_sim_comb[, unified_int := paste(sort(c(s1,s2)), collapse = '--'), by = 1:nrow(uniq_sim_comb)] - sample_dt[uniq_sim_comb, unified_int := unified_int, on = c(s1 = 's1', s2 = 's2')] - sample_dt[, type_int := ifelse(s1 == s2, 'homo', 'hetero')] + s1_vector <- do.call("c", s1_list) + s2_vector <- do.call("c", s2_list) + round_vector <- rep(x = 1:number_of_simulations, each = length_ints) + round_vector <- paste0("sim", round_vector) - return(sample_dt) + # data.table variables + s1 <- s2 <- unified_int <- type_int <- NULL + sample_dt <- data.table::data.table( + s1 = s1_vector, s2 = s2_vector, round = round_vector) + uniq_sim_comb <- unique(sample_dt[, .(s1, s2)]) + uniq_sim_comb[, unified_int := paste( + sort(c(s1, s2)), collapse = "--"), by = 1:nrow(uniq_sim_comb)] + sample_dt[uniq_sim_comb, unified_int := unified_int, on = c( + s1 = "s1", s2 = "s2")] + sample_dt[, type_int := ifelse(s1 == s2, "homo", "hetero")] + + return(sample_dt) } @@ -91,186 +97,228 @@ make_simulated_network = function(gobject, #' @param feat_type feature type #' @param spatial_network_name name of spatial network to use #' @param cluster_column name of column to use for clusters -#' @param number_of_simulations number of simulations to create expected observations +#' @param number_of_simulations number of simulations to create expected +#' observations #' @param adjust_method method to adjust p.values #' @param set_seed use of seed #' @param seed_number seed number to use -#' @return List of cell Proximity scores (CPscores) in data.table format. The first -#' data.table (raw_sim_table) shows the raw observations of both the original and -#' simulated networks. The second data.table (enrichm_res) shows the enrichment results. -#' @details Spatial proximity enrichment or depletion between pairs of cell types -#' is calculated by calculating the observed over the expected frequency -#' of cell-cell proximity interactions. The expected frequency is the average frequency -#' calculated from a number of spatial network simulations. Each individual simulation is -#' obtained by reshuffling the cell type labels of each node (cell) -#' in the spatial network. +#' @return List of cell Proximity scores (CPscores) in data.table format. The +#' first data.table (raw_sim_table) shows the raw observations of both the +#' original and simulated networks. The second data.table (enrichm_res) shows +#' the enrichment results. +#' @details Spatial proximity enrichment or depletion between pairs of cell +#' types is calculated by calculating the observed over the expected frequency +#' of cell-cell proximity interactions. The expected frequency is the average +#' frequency calculated from a number of spatial network simulations. Each +#' individual simulation is obtained by reshuffling the cell type labels of +#' each node (cell) in the spatial network. #' @export cellProximityEnrichment <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network_name = 'Delaunay_network', - cluster_column, - number_of_simulations = 1000, - adjust_method = c("none", "fdr", "bonferroni","BH", - "holm", "hochberg", "hommel", - "BY"), - set_seed = TRUE, - seed_number = 1234) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # p.adj test - sel_adjust_method = match.arg(adjust_method, choices = c("none", "fdr", "bonferroni","BH", - "holm", "hochberg", "hommel", - "BY")) - - spatial_network_annot = annotateSpatialNetwork(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - # remove double edges between same cells # - # a simplified network does not have double edges between cells # - # spatial_network_annot[, unified_cells := paste(sort(c(to,from)), collapse = '--'), by = 1:nrow(spatial_network_annot)] - - # data.table variables - unified_cells = type_int = N = NULL - - spatial_network_annot = dt_sort_combine_two_columns(spatial_network_annot, 'to', 'from', 'unified_cells') - spatial_network_annot = spatial_network_annot[!duplicated(unified_cells)] - - sample_dt = make_simulated_network(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column, - number_of_simulations = number_of_simulations, - set_seed = set_seed, - seed_number = seed_number) - - # combine original and simulated network - table_sim_results = sample_dt[, .N, by = c('unified_int', 'type_int', 'round')] - - ## create complete simulations - ## add 0 if no single interaction was found - unique_ints = unique(table_sim_results[,.(unified_int, type_int)]) - - # data.table with 0's for all interactions - minimum_simulations = unique_ints[rep(seq_len(nrow(unique_ints)), number_of_simulations), ] - minimum_simulations[, round := rep(paste0('sim',1:number_of_simulations), each = nrow(unique_ints))] - minimum_simulations[, N := 0] - - table_sim_minimum_results = rbind(table_sim_results, minimum_simulations) - table_sim_minimum_results[, V1 := sum(N), by = c('unified_int', 'type_int', 'round')] - table_sim_minimum_results = unique(table_sim_minimum_results[,.(unified_int, type_int, round, V1)]) - table_sim_results = table_sim_minimum_results - - - # data.table variables - orig = unified_int = V1 = original = enrichm = simulations = NULL - - table_sim_results[, orig := 'simulations'] - spatial_network_annot[, round := 'original'] - - table_orig_results = spatial_network_annot[, .N, by = c('unified_int', 'type_int', 'round')] - table_orig_results[, orig := 'original'] - data.table::setnames(table_orig_results, old = 'N', new = 'V1') - - table_results = rbind(table_orig_results, table_sim_results) - - - - # add missing combinations from original or simulations - # probably not needed anymore - all_simulation_ints = as.character(unique(table_results[orig == 'simulations']$unified_int)) - all_original_ints = as.character(unique(table_results[orig == 'original']$unified_int)) - missing_in_original = all_simulation_ints[!all_simulation_ints %in% all_original_ints] - missing_in_simulations = all_original_ints[!all_original_ints %in% all_simulation_ints] - create_missing_for_original = table_results[unified_int %in% missing_in_original] - create_missing_for_original = unique(create_missing_for_original[, c('orig', 'V1') := list('original', 0)]) - create_missing_for_simulations = table_results[unified_int %in% missing_in_simulations] - create_missing_for_simulations = unique(create_missing_for_simulations[, c('orig', 'V1') := list('simulations', 0)]) - - table_results <- do.call('rbind', list(table_results, create_missing_for_original, create_missing_for_simulations)) - - - ## p-values - combo_list = rep(NA, length = length(unique(table_results$unified_int))) - p_high = rep(NA, length = length(unique(table_results$unified_int))) - p_low = rep(NA, length = length(unique(table_results$unified_int))) - - for(int_combo in seq_along(unique(table_results$unified_int))) { - - this_combo = as.character(unique(table_results$unified_int)[int_combo]) + spat_unit = NULL, + feat_type = NULL, + spatial_network_name = "Delaunay_network", + cluster_column, + number_of_simulations = 1000, + adjust_method = c( + "none", "fdr", "bonferroni", "BH", + "holm", "hochberg", "hommel", + "BY" + ), + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - sub = table_results[unified_int == this_combo] + # p.adj test + sel_adjust_method <- match.arg(adjust_method, choices = c( + "none", "fdr", "bonferroni", "BH", + "holm", "hochberg", "hommel", + "BY" + )) + + spatial_network_annot <- annotateSpatialNetwork( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) - orig_value = sub[orig == 'original']$V1 - sim_values = sub[orig == 'simulations']$V1 + # remove double edges between same cells # + # a simplified network does not have double edges between cells # + + # data.table variables + unified_cells <- type_int <- N <- NULL + + spatial_network_annot <- dt_sort_combine_two_columns( + spatial_network_annot, "to", "from", "unified_cells") + spatial_network_annot <- spatial_network_annot[!duplicated(unified_cells)] + + sample_dt <- make_simulated_network( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column, + number_of_simulations = number_of_simulations, + set_seed = set_seed, + seed_number = seed_number + ) - length_simulations = length(sim_values) - if(length_simulations != number_of_simulations) { - additional_length_needed = number_of_simulations-length_simulations - sim_values = c(sim_values, rep(0, additional_length_needed)) - #length_simulations = c(length_simulations, rep(0, additional_length_needed)) + # combine original and simulated network + table_sim_results <- sample_dt[, .N, by = c( + "unified_int", "type_int", "round")] + + ## create complete simulations + ## add 0 if no single interaction was found + unique_ints <- unique(table_sim_results[, .(unified_int, type_int)]) + + # data.table with 0's for all interactions + minimum_simulations <- unique_ints[rep( + seq_len(nrow(unique_ints)), number_of_simulations), ] + minimum_simulations[, round := rep( + paste0("sim", 1:number_of_simulations), each = nrow(unique_ints))] + minimum_simulations[, N := 0] + + table_sim_minimum_results <- rbind(table_sim_results, minimum_simulations) + table_sim_minimum_results[, V1 := sum(N), by = c( + "unified_int", "type_int", "round")] + table_sim_minimum_results <- unique( + table_sim_minimum_results[, .(unified_int, type_int, round, V1)]) + table_sim_results <- table_sim_minimum_results + + + # data.table variables + orig <- unified_int <- V1 <- original <- enrichm <- simulations <- NULL + + table_sim_results[, orig := "simulations"] + spatial_network_annot[, round := "original"] + + table_orig_results <- spatial_network_annot[, .N, by = c( + "unified_int", "type_int", "round")] + table_orig_results[, orig := "original"] + data.table::setnames(table_orig_results, old = "N", new = "V1") + + table_results <- rbind(table_orig_results, table_sim_results) + + + + # add missing combinations from original or simulations + # probably not needed anymore + all_simulation_ints <- as.character(unique(table_results[ + orig == "simulations"]$unified_int)) + all_original_ints <- as.character(unique(table_results[ + orig == "original"]$unified_int)) + missing_in_original <- all_simulation_ints[ + !all_simulation_ints %in% all_original_ints] + missing_in_simulations <- all_original_ints[ + !all_original_ints %in% all_simulation_ints] + create_missing_for_original <- table_results[ + unified_int %in% missing_in_original] + create_missing_for_original <- unique(create_missing_for_original[ + , c("orig", "V1") := list("original", 0)]) + create_missing_for_simulations <- table_results[ + unified_int %in% missing_in_simulations] + create_missing_for_simulations <- unique( + create_missing_for_simulations[, c("orig", "V1") := list( + "simulations", 0)]) + + table_results <- do.call( + "rbind", + list(table_results, create_missing_for_original, + create_missing_for_simulations)) + + + ## p-values + combo_list <- rep(NA, length = length(unique(table_results$unified_int))) + p_high <- rep(NA, length = length(unique(table_results$unified_int))) + p_low <- rep(NA, length = length(unique(table_results$unified_int))) + + for (int_combo in seq_along(unique(table_results$unified_int))) { + this_combo <- as.character(unique(table_results$unified_int)[int_combo]) + + sub <- table_results[unified_int == this_combo] + + orig_value <- sub[orig == "original"]$V1 + sim_values <- sub[orig == "simulations"]$V1 + + length_simulations <- length(sim_values) + if (length_simulations != number_of_simulations) { + additional_length_needed <- number_of_simulations - + length_simulations + sim_values <- c(sim_values, rep(0, additional_length_needed)) + } + + p_orig_higher <- 1 - (sum((orig_value + 1) > (sim_values + 1)) / + number_of_simulations) + p_orig_lower <- 1 - (sum((orig_value + 1) < (sim_values + 1)) / + number_of_simulations) + + combo_list[[int_combo]] <- this_combo + p_high[[int_combo]] <- p_orig_higher + p_low[[int_combo]] <- p_orig_lower } - - p_orig_higher = 1 - (sum((orig_value+1) > (sim_values+1))/number_of_simulations) - p_orig_lower = 1 - (sum((orig_value+1) < (sim_values+1))/number_of_simulations) - - combo_list[[int_combo]] = this_combo - p_high[[int_combo]] = p_orig_higher - p_low[[int_combo]] = p_orig_lower - - } - res_pvalue_DT = data.table::data.table(unified_int = as.vector(combo_list), p_higher_orig = p_high, p_lower_orig = p_low) + res_pvalue_DT <- data.table::data.table( + unified_int = as.vector(combo_list), + p_higher_orig = p_high, + p_lower_orig = p_low) - # depletion or enrichment in barplot format - table_mean_results <- table_results[, .(mean(V1)), by = c('orig', 'unified_int', 'type_int')] - table_mean_results_dc <- data.table::dcast.data.table(data = table_mean_results, formula = type_int+unified_int~orig, value.var = 'V1') - table_mean_results_dc[, original := ifelse(is.na(original), 0, original)] - table_mean_results_dc[, enrichm := log2((original+1)/(simulations+1))] + # depletion or enrichment in barplot format + table_mean_results <- table_results[, .(mean(V1)), by = c( + "orig", "unified_int", "type_int")] + table_mean_results_dc <- data.table::dcast.data.table( + data = table_mean_results, formula = type_int + unified_int ~ orig, + value.var = "V1") + table_mean_results_dc[, original := ifelse(is.na(original), 0, original)] + table_mean_results_dc[, enrichm := log2((original + 1) / (simulations + 1))] - table_mean_results_dc <- merge(table_mean_results_dc, res_pvalue_DT, by = 'unified_int') - data.table::setorder(table_mean_results_dc, enrichm) - table_mean_results_dc[, unified_int := factor(unified_int, unified_int)] + table_mean_results_dc <- merge( + table_mean_results_dc, res_pvalue_DT, by = "unified_int") + data.table::setorder(table_mean_results_dc, enrichm) + table_mean_results_dc[, unified_int := factor(unified_int, unified_int)] - # adjust p-values for mht + # adjust p-values for mht - # data.table variables - p.adj_higher = p.adj_lower = p_lower_orig = p_higher_orig = PI_value = int_ranking = NULL + # data.table variables + p.adj_higher <- p.adj_lower <- p_lower_orig <- p_higher_orig <- + PI_value <- int_ranking <- NULL - table_mean_results_dc[, p.adj_higher := stats::p.adjust(p_higher_orig, method = sel_adjust_method)] - table_mean_results_dc[, p.adj_lower := stats::p.adjust(p_lower_orig, method = sel_adjust_method)] + table_mean_results_dc[, p.adj_higher := stats::p.adjust( + p_higher_orig, method = sel_adjust_method)] + table_mean_results_dc[, p.adj_lower := stats::p.adjust( + p_lower_orig, method = sel_adjust_method)] - table_mean_results_dc[, PI_value := ifelse(p.adj_higher <= p.adj_lower, - -log10(p.adj_higher+(1/number_of_simulations))*enrichm, - -log10(p.adj_lower+(1/number_of_simulations))*enrichm)] - data.table::setorder(table_mean_results_dc, PI_value) + table_mean_results_dc[, PI_value := ifelse(p.adj_higher <= p.adj_lower, + -log10(p.adj_higher + (1 / number_of_simulations)) * enrichm, + -log10(p.adj_lower + (1 / number_of_simulations)) * enrichm + )] + data.table::setorder(table_mean_results_dc, PI_value) - # order - table_mean_results_dc <- table_mean_results_dc[order(-PI_value)] - table_mean_results_dc[, int_ranking := 1:.N] - - return(list(raw_sim_table = table_results, enrichm_res = table_mean_results_dc)) + # order + table_mean_results_dc <- table_mean_results_dc[order(-PI_value)] + table_mean_results_dc[, int_ranking := 1:.N] + return(list(raw_sim_table = table_results, + enrichm_res = table_mean_results_dc)) } #' @title addCellIntMetadata #' @name addCellIntMetadata -#' @description Creates an additional metadata column with information about interacting and non-interacting cell types of the +#' @description Creates an additional metadata column with information about +#' interacting and non-interacting cell types of the #' selected cell-cell interaction. #' @param gobject giotto object #' @param feat_type feature type @@ -281,87 +329,107 @@ cellProximityEnrichment <- function(gobject, #' @param name name for the new metadata column #' @param return_gobject return an updated giotto object #' @return Giotto object -#' @details This function will create an additional metadata column which selects interacting cell types for a specific cell-cell -#' interaction. For example, if you want to color interacting astrocytes and oligodendrocytes it will create a new metadata column with -#' the values "select_astrocytes", "select_oligodendrocytes", "other_astrocytes", "other_oligodendroyctes" and "other". Where "other" is all -#' other cell types found within the selected cell type column. +#' @details This function will create an additional metadata column which +#' selects interacting cell types for a specific cell-cell +#' interaction. For example, if you want to color interacting astrocytes and +#' oligodendrocytes it will create a new metadata column with +#' the values "select_astrocytes", "select_oligodendrocytes", +#' "other_astrocytes", "other_oligodendroyctes" and "other". Where "other" is +#' all other cell types found within the selected cell type column. #' @export -addCellIntMetadata = function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network = 'spatial_network', - cluster_column, - cell_interaction, - name = 'select_int', - return_gobject = TRUE) { - - # set spatial unit and feature type - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - - if(is.null(spatial_network)) { - stop('spatial_network must be provided, this must be an existing spatial network \n') - } - - if(is.null(cluster_column)) { - stop('cluster_column must be provided, this must be an existing cell metadata column, see pData(your_giotto_object) \n') - } - - if(is.null(cell_interaction)) { - stop('cell_interaction must be provided, this must be cell--cell interaction between cell types in cluster_column \n') - } - - # create spatial network - spatial_network_annot = annotateSpatialNetwork(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network, - cluster_column = cluster_column) - - # selected vs other cells - - # data.table variables - unified_int = cell_ID = NULL - - selected_cells = unique(c(spatial_network_annot[unified_int == cell_interaction]$to, - spatial_network_annot[unified_int == cell_interaction]$from)) - - # cell_metadata = data.table::copy(pDataDT(gobject, feat_type = feat_type)) - cell_metadata = getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'cellMetaObj', - copy_obj = TRUE) - - cell_type_1 = strsplit(cell_interaction, split = '--')[[1]][1] - cell_type_2 = strsplit(cell_interaction, split = '--')[[1]][2] - - cell_metadata[][, c(name) := ifelse(!get(cluster_column) %in% c(cell_type_1, cell_type_2), 'other', - ifelse(get(cluster_column) == cell_type_1 & cell_ID %in% selected_cells, paste0("select_", cell_type_1), - ifelse(get(cluster_column) == cell_type_2 & cell_ID %in% selected_cells, paste0("select_", cell_type_2), - ifelse(get(cluster_column) == cell_type_1, paste0("other_", cell_type_1), paste0("other_", cell_type_2)))))] - - if(return_gobject == TRUE) { - - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - # gobject@cell_metadata[[spat_unit]][[feat_type]] = cell_metadata - gobject = set_cell_metadata(gobject, - metadata = cell_metadata, - verbose = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_add_cell_int_info') - - return(gobject) - } else { - return(cell_metadata) - } +addCellIntMetadata <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + spatial_network = "spatial_network", + cluster_column, + cell_interaction, + name = "select_int", + return_gobject = TRUE) { + # set spatial unit and feature type + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + + if (is.null(spatial_network)) { + stop("spatial_network must be provided, this must be an existing + spatial network") + } + + if (is.null(cluster_column)) { + stop("cluster_column must be provided, this must be an existing cell + metadata column, see pData(your_giotto_object)") + } + + if (is.null(cell_interaction)) { + stop("cell_interaction must be provided, this must be cell--cell + interaction between cell types in cluster_column") + } + + # create spatial network + spatial_network_annot <- annotateSpatialNetwork( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network, + cluster_column = cluster_column + ) + + # selected vs other cells + + # data.table variables + unified_int <- cell_ID <- NULL + + selected_cells <- unique(c( + spatial_network_annot[unified_int == cell_interaction]$to, + spatial_network_annot[unified_int == cell_interaction]$from + )) + + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "cellMetaObj", + copy_obj = TRUE + ) + cell_type_1 <- strsplit(cell_interaction, split = "--")[[1]][1] + cell_type_2 <- strsplit(cell_interaction, split = "--")[[1]][2] + + cell_metadata[][, c(name) := ifelse(!get(cluster_column) %in% c( + cell_type_1, cell_type_2), "other", + ifelse(get(cluster_column) == cell_type_1 & cell_ID %in% selected_cells, + paste0("select_", cell_type_1), + ifelse(get(cluster_column) == cell_type_2 & cell_ID %in% + selected_cells, paste0("select_", cell_type_2), + ifelse(get(cluster_column) == cell_type_1, + paste0("other_", cell_type_1), + paste0("other_", cell_type_2)) + ) + ) + )] + + if (return_gobject == TRUE) { + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- set_cell_metadata(gobject, + metadata = cell_metadata, + verbose = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + + ## update parameters used ## + gobject <- update_giotto_params(gobject, + description = "_add_cell_int_info") + + return(gobject) + } else { + return(cell_metadata) + } } @@ -380,42 +448,44 @@ NULL #' @describeIn cell_proximity_tests t.test #' @keywords internal -.do_ttest = function(expr_values, - select_ind, - other_ind, - adjust_method, - mean_method, - offset = 0.1) { - - vmsg(.is_debug = TRUE, ".do_ttest") - - # data.table variables - p.value = p.adj = NULL - - mean_sel = my_rowMeans(expr_values[,select_ind], method = mean_method, offset = offset) - mean_all = my_rowMeans(expr_values[,other_ind], method = mean_method, offset = offset) - - #if(length(select_ind) == 1){mean_sel = expr_values[,select_ind]} else{mean_sel = rowMeans(expr_values[,select_ind])} - #if(length(other_ind) == 1){mean_all = expr_values[,other_ind]} else{mean_all = rowMeans(expr_values[,other_ind])} - - if(length(select_ind) == 1 | length(other_ind) == 1) { - results = NaN - } else { - results = apply(expr_values, MARGIN = 1, function(x) { - p.value = stats::t.test(x[select_ind], x[other_ind])$p.value - }) - } +.do_ttest <- function(expr_values, + select_ind, + other_ind, + adjust_method, + mean_method, + offset = 0.1) { + vmsg(.is_debug = TRUE, ".do_ttest") + + # data.table variables + p.value <- p.adj <- NULL + + mean_sel <- my_rowMeans( + expr_values[, select_ind], method = mean_method, offset = offset) + mean_all <- my_rowMeans( + expr_values[, other_ind], method = mean_method, offset = offset) + + if (length(select_ind) == 1 | length(other_ind) == 1) { + results <- NaN + } else { + results <- apply(expr_values, MARGIN = 1, function(x) { + p.value <- stats::t.test(x[select_ind], x[other_ind])$p.value + }) + } - # other info - log2fc = log2((mean_sel+offset)/(mean_all+offset)) - diff = mean_sel - mean_all + # other info + log2fc <- log2((mean_sel + offset) / (mean_all + offset)) + diff <- mean_sel - mean_all - resultsDT = data.table('feats' = rownames(expr_values), 'sel' = mean_sel, 'other' = mean_all, 'log2fc' = log2fc, 'diff' = diff, 'p.value' = unlist(results)) - resultsDT[, p.value := ifelse(is.nan(p.value), 1, p.value)] - resultsDT[, p.adj := stats::p.adjust(p.value, method = adjust_method)] - setorder(resultsDT, p.adj) + resultsDT <- data.table( + "feats" = rownames(expr_values), + "sel" = mean_sel, "other" = mean_all, + "log2fc" = log2fc, "diff" = diff, + "p.value" = unlist(results)) + resultsDT[, p.value := ifelse(is.nan(p.value), 1, p.value)] + resultsDT[, p.adj := stats::p.adjust(p.value, method = adjust_method)] + setorder(resultsDT, p.adj) - return(resultsDT) + return(resultsDT) } @@ -423,61 +493,66 @@ NULL #' @describeIn cell_proximity_tests limma t.test #' @keywords internal -.do_limmatest = function(expr_values, - select_ind, - other_ind, - mean_method, - offset = 0.1) { - - vmsg(.is_debug = TRUE, ".do_limmatest") - - # data.table variables - sel = other = feats = P.Value = adj.P.Val = p.adj = NULL - - expr_values_subset = cbind(expr_values[,select_ind], expr_values[,other_ind]) - mygroups = c(rep('sel', length(select_ind)), rep('other', length(other_ind))) - mygroups = factor(mygroups, levels = unique(mygroups)) - - design = stats::model.matrix(~0+mygroups) - colnames(design) = levels(mygroups) - fit = limma::lmFit(expr_values_subset, design = design) - - - cont.matrix = limma::makeContrasts( - sel_vs_other = sel-other, - levels = design - ) - - fitcontrast = limma::contrasts.fit(fit, cont.matrix) - fitc_ebayes = limma::eBayes(fitcontrast) - - # limma to DT - limma_result = limma::topTable(fitc_ebayes, coef = 1,number = 100000, confint = T) - limmaDT = data.table::as.data.table(limma_result); limmaDT[, feats := rownames(limma_result)] - - # other info - mean_sel = my_rowMeans(expr_values[,select_ind], method = mean_method, offset = offset) - mean_all = my_rowMeans(expr_values[,other_ind], method = mean_method, offset = offset) - - #if(length(select_ind) == 1){mean_sel = expr_values[,select_ind]} else{mean_sel = rowMeans(expr_values[,select_ind])} - #if(length(other_ind) == 1){mean_all = expr_values[,other_ind]} else{mean_all = rowMeans(expr_values[,other_ind])} - - log2fc = log2((mean_sel+offset)/(mean_all+offset)) - diff = mean_sel - mean_all - - tempDT = data.table::data.table('feats' = rownames(expr_values), - 'sel'= mean_sel, - 'other' = mean_all, - 'log2fc' = log2fc, - 'diff' = diff) - limmaDT = data.table::merge.data.table(limmaDT, tempDT, by = 'feats') - limmaDT = limmaDT[,.(feats, sel, other, log2fc, diff, P.Value, adj.P.Val)] - colnames(limmaDT) = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj') +.do_limmatest <- function(expr_values, + select_ind, + other_ind, + mean_method, + offset = 0.1) { + vmsg(.is_debug = TRUE, ".do_limmatest") + + # data.table variables + sel <- other <- feats <- P.Value <- adj.P.Val <- p.adj <- NULL + + expr_values_subset <- cbind( + expr_values[, select_ind], expr_values[, other_ind]) + mygroups <- c(rep("sel", length(select_ind)), + rep("other", length(other_ind))) + mygroups <- factor(mygroups, levels = unique(mygroups)) + + design <- stats::model.matrix(~ 0 + mygroups) + colnames(design) <- levels(mygroups) + fit <- limma::lmFit(expr_values_subset, design = design) + + + cont.matrix <- limma::makeContrasts( + sel_vs_other = sel - other, + levels = design + ) - setorder(limmaDT, p.adj) + fitcontrast <- limma::contrasts.fit(fit, cont.matrix) + fitc_ebayes <- limma::eBayes(fitcontrast) + + # limma to DT + limma_result <- limma::topTable( + fitc_ebayes, coef = 1, number = 100000, confint = TRUE) + limmaDT <- data.table::as.data.table(limma_result) + limmaDT[, feats := rownames(limma_result)] + + # other info + mean_sel <- my_rowMeans( + expr_values[, select_ind], method = mean_method, offset = offset) + mean_all <- my_rowMeans( + expr_values[, other_ind], method = mean_method, offset = offset) + + log2fc <- log2((mean_sel + offset) / (mean_all + offset)) + diff <- mean_sel - mean_all + + tempDT <- data.table::data.table( + "feats" = rownames(expr_values), + "sel" = mean_sel, + "other" = mean_all, + "log2fc" = log2fc, + "diff" = diff + ) + limmaDT <- data.table::merge.data.table(limmaDT, tempDT, by = "feats") + limmaDT <- limmaDT[ + , .(feats, sel, other, log2fc, diff, P.Value, adj.P.Val)] + colnames(limmaDT) <- c("feats", "sel", "other", "log2fc", "diff", + "p.value", "p.adj") - return(limmaDT) + setorder(limmaDT, p.adj) + return(limmaDT) } @@ -485,214 +560,221 @@ NULL #' @describeIn cell_proximity_tests wilcoxon #' @keywords internal -.do_wilctest = function(expr_values, - select_ind, - other_ind, - adjust_method, - mean_method, - offset = 0.1) { - - vmsg(.is_debug = TRUE, ".do_wilctest") - - # data.table variables - p.value = p.adj = NULL - - mean_sel = my_rowMeans(expr_values[,select_ind], method = mean_method, offset = offset) - mean_all = my_rowMeans(expr_values[,other_ind], method = mean_method, offset = offset) - - #if(length(select_ind) == 1){mean_sel = expr_values[,select_ind]} else{mean_sel = rowMeans(expr_values[,select_ind])} - #if(length(other_ind) == 1){mean_all = expr_values[,other_ind]} else{mean_all = rowMeans(expr_values[,other_ind])} - - if(length(select_ind) == 1 | length(other_ind) == 1) { - results = NaN - } else { - results = apply(expr_values, MARGIN = 1, function(x) { - p.value = stats::wilcox.test(x[select_ind], x[other_ind])$p.value - }) - } - - # other info - log2fc = log2((mean_sel+offset)/(mean_all+offset)) - diff = mean_sel - mean_all - - resultsDT = data.table('feats' = rownames(expr_values), 'sel' = mean_sel, 'other' = mean_all, 'log2fc' = log2fc, 'diff' = diff, 'p.value' = unlist(results)) - resultsDT[, p.value := ifelse(is.nan(p.value), 1, p.value)] - resultsDT[, p.adj := stats::p.adjust(p.value, method = adjust_method)] - setorder(resultsDT, p.adj) - - return(resultsDT) +.do_wilctest <- function(expr_values, + select_ind, + other_ind, + adjust_method, + mean_method, + offset = 0.1) { + vmsg(.is_debug = TRUE, ".do_wilctest") + + # data.table variables + p.value <- p.adj <- NULL + + mean_sel <- my_rowMeans( + expr_values[, select_ind], method = mean_method, offset = offset) + mean_all <- my_rowMeans( + expr_values[, other_ind], method = mean_method, offset = offset) + + if (length(select_ind) == 1 | length(other_ind) == 1) { + results <- NaN + } else { + results <- apply(expr_values, MARGIN = 1, function(x) { + p.value <- stats::wilcox.test(x[select_ind], x[other_ind])$p.value + }) + } + # other info + log2fc <- log2((mean_sel + offset) / (mean_all + offset)) + diff <- mean_sel - mean_all + + resultsDT <- data.table( + "feats" = rownames(expr_values), + "sel" = mean_sel, + "other" = mean_all, + "log2fc" = log2fc, + "diff" = diff, + "p.value" = unlist(results)) + resultsDT[, p.value := ifelse(is.nan(p.value), 1, p.value)] + resultsDT[, p.adj := stats::p.adjust(p.value, method = adjust_method)] + setorder(resultsDT, p.adj) + + return(resultsDT) } # calculate original values -.do_permuttest_original = function(expr_values, - select_ind, - other_ind, - name = 'orig', - mean_method, - offset = 0.1) { - - # data.table variables - feats = NULL - - mean_sel = my_rowMeans(expr_values[,select_ind], method = mean_method, offset = offset) - mean_all = my_rowMeans(expr_values[,other_ind], method = mean_method, offset = offset) - - #if(length(select_ind) == 1){mean_sel = expr_values[,select_ind]} else{mean_sel = rowMeans(expr_values[,select_ind])} - #if(length(other_ind) == 1){mean_all = expr_values[,other_ind]} else{mean_all = rowMeans(expr_values[,other_ind])} - - log2fc = log2((mean_sel+offset)/(mean_all+offset)) - diff = mean_sel - mean_all - - resultsDT = data.table('sel' = mean_sel, 'other' = mean_all, 'log2fc' = log2fc, 'diff' = diff) - resultsDT[, feats := rownames(expr_values)] - resultsDT[, name := name] - - return(resultsDT) - +.do_permuttest_original <- function(expr_values, + select_ind, + other_ind, + name = "orig", + mean_method, + offset = 0.1) { + # data.table variables + feats <- NULL + + mean_sel <- my_rowMeans(expr_values[ + , select_ind], method = mean_method, offset = offset) + mean_all <- my_rowMeans(expr_values[ + , other_ind], method = mean_method, offset = offset) + + log2fc <- log2((mean_sel + offset) / (mean_all + offset)) + diff <- mean_sel - mean_all + + resultsDT <- data.table( + "sel" = mean_sel, "other" = mean_all, "log2fc" = log2fc, "diff" = diff) + resultsDT[, feats := rownames(expr_values)] + resultsDT[, name := name] + + return(resultsDT) } # calculate random values -.do_permuttest_random = function(expr_values, - select_ind, - other_ind, - name = 'perm_1', - mean_method, - offset = 0.1, - set_seed = TRUE, - seed_number = 1234) { - - # data.table variables - feats = NULL - - l_select_ind = length(select_ind) - l_other_ind = length(other_ind) - - all_ind = c(select_ind, other_ind) - - if(set_seed == TRUE) { - set.seed(seed = seed_number) - } - random_select = sample(all_ind, size = l_select_ind, replace = F) - random_other = all_ind[!all_ind %in% random_select] - - # alternative - mean_sel = my_rowMeans(expr_values[,random_select], method = mean_method, offset = offset) - mean_all = my_rowMeans(expr_values[,random_other], method = mean_method, offset = offset) - #if(length(select_ind) == 1){mean_sel = expr_values[,random_select]} else{mean_sel = rowMeans(expr_values[,random_select])} - #if(length(other_ind) == 1){mean_all = expr_values[,random_other]} else{mean_all = rowMeans(expr_values[,random_other])} +.do_permuttest_random <- function(expr_values, + select_ind, + other_ind, + name = "perm_1", + mean_method, + offset = 0.1, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + feats <- NULL + + l_select_ind <- length(select_ind) + l_other_ind <- length(other_ind) + + all_ind <- c(select_ind, other_ind) + + if (set_seed == TRUE) { + set.seed(seed = seed_number) + } + random_select <- sample(all_ind, size = l_select_ind, replace = FALSE) + random_other <- all_ind[!all_ind %in% random_select] - log2fc = log2((mean_sel+offset)/(mean_all+offset)) - diff = mean_sel - mean_all + # alternative + mean_sel <- my_rowMeans( + expr_values[, random_select], method = mean_method, offset = offset) + mean_all <- my_rowMeans( + expr_values[, random_other], method = mean_method, offset = offset) - resultsDT = data.table('sel' = mean_sel, 'other' = mean_all, 'log2fc' = log2fc, 'diff' = diff) - resultsDT[, feats := rownames(expr_values)] - resultsDT[, name := name] + log2fc <- log2((mean_sel + offset) / (mean_all + offset)) + diff <- mean_sel - mean_all - return(resultsDT) + resultsDT <- data.table( + "sel" = mean_sel, "other" = mean_all, "log2fc" = log2fc, "diff" = diff) + resultsDT[, feats := rownames(expr_values)] + resultsDT[, name := name] + return(resultsDT) } # calculate multiple random values -.do_multi_permuttest_random = function(expr_values, - select_ind, - other_ind, - mean_method, - offset = 0.1, - n = 100, - set_seed = TRUE, - seed_number = 1234) { - - if(set_seed == TRUE) { - seed_number_list = seed_number:(seed_number + (n-1)) - } - - result = lapply(X = 1:n, FUN = function(x) { - - - - seed_number = seed_number_list[x] - - perm_rand = .do_permuttest_random(expr_values = expr_values, - select_ind = select_ind, - other_ind = other_ind, - name = paste0('perm_', x), - mean_method = mean_method, - offset = offset, - set_seed = set_seed, - seed_number = seed_number) - - }) +.do_multi_permuttest_random <- function(expr_values, + select_ind, + other_ind, + mean_method, + offset = 0.1, + n = 100, + set_seed = TRUE, + seed_number = 1234) { + if (set_seed == TRUE) { + seed_number_list <- seed_number:(seed_number + (n - 1)) + } - final_result = do.call('rbind', result) + result <- lapply(X = 1:n, FUN = function(x) { + seed_number <- seed_number_list[x] + + perm_rand <- .do_permuttest_random( + expr_values = expr_values, + select_ind = select_ind, + other_ind = other_ind, + name = paste0("perm_", x), + mean_method = mean_method, + offset = offset, + set_seed = set_seed, + seed_number = seed_number + ) + }) + final_result <- do.call("rbind", result) } #' @describeIn cell_proximity_tests random permutation #' @keywords internal -.do_permuttest = function(expr_values, - select_ind, other_ind, - n_perm = 1000, - adjust_method = 'fdr', - mean_method, - offset = 0.1, - set_seed = TRUE, - seed_number = 1234) { - - - # data.table variables - log2fc_diff = log2fc = sel = other = feats = p_higher = p_lower = perm_sel = NULL - perm_other = perm_log2fc = perm_diff = p.value = p.adj = NULL - - ## original data - vmsg(.is_debug = TRUE, "ok1") - original = .do_permuttest_original(expr_values = expr_values, - select_ind = select_ind, other_ind = other_ind, - name = 'orig', - mean_method = mean_method, - offset = offset) - - ## random permutations - vmsg(.is_debug = TRUE, "ok2") - random_perms = .do_multi_permuttest_random(expr_values = expr_values, - n = n_perm, - select_ind = select_ind, - other_ind = other_ind, - mean_method = mean_method, - offset = offset, - set_seed = set_seed, - seed_number = seed_number) - - ## - random_perms[, log2fc_diff := rep(original$log2fc, n_perm) - log2fc] - random_perms[, c('perm_sel', 'perm_other', 'perm_log2fc', 'perm_diff') := list(mean(sel), mean(other), mean(log2fc), mean(diff)), by = feats] - - ## get p-values - random_perms[, p_higher := sum(log2fc_diff > 0), by = feats] - random_perms[, p_higher := 1-(p_higher/n_perm)] - random_perms[, p_lower := sum(log2fc_diff < 0), by = feats] - random_perms[, p_lower := 1-(p_lower/n_perm)] - - ## combine results permutation and original - random_perms_res = unique(random_perms[,.(feats, perm_sel, perm_other, perm_log2fc, perm_diff, p_higher, p_lower)]) - results_m = data.table::merge.data.table(random_perms_res, original[,.(feats, sel, other, log2fc, diff)], by = 'feats') - - # select lowest p-value and perform p.adj - results_m[, p.value := ifelse(p_higher <= p_lower, p_higher, p_lower)] - results_m[, p.adj := stats::p.adjust(p.value, method = adjust_method)] - - results_m = results_m[,.(feats, sel, other, log2fc, diff, p.value, p.adj, perm_sel, perm_other, perm_log2fc, perm_diff)] - setorder(results_m, p.adj, -log2fc) - - return(results_m) +.do_permuttest <- function(expr_values, + select_ind, other_ind, + n_perm = 1000, + adjust_method = "fdr", + mean_method, + offset = 0.1, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + log2fc_diff <- log2fc <- sel <- other <- feats <- p_higher <- p_lower <- + perm_sel <- NULL + perm_other <- perm_log2fc <- perm_diff <- p.value <- p.adj <- NULL + + ## original data + vmsg(.is_debug = TRUE, "ok1") + original <- .do_permuttest_original( + expr_values = expr_values, + select_ind = select_ind, other_ind = other_ind, + name = "orig", + mean_method = mean_method, + offset = offset + ) + + ## random permutations + vmsg(.is_debug = TRUE, "ok2") + random_perms <- .do_multi_permuttest_random( + expr_values = expr_values, + n = n_perm, + select_ind = select_ind, + other_ind = other_ind, + mean_method = mean_method, + offset = offset, + set_seed = set_seed, + seed_number = seed_number + ) + ## + random_perms[, log2fc_diff := rep(original$log2fc, n_perm) - log2fc] + random_perms[, + c("perm_sel", "perm_other", "perm_log2fc", "perm_diff") := list( + mean(sel), mean(other), mean(log2fc), mean(diff)), + by = feats] + + ## get p-values + random_perms[, p_higher := sum(log2fc_diff > 0), by = feats] + random_perms[, p_higher := 1 - (p_higher / n_perm)] + random_perms[, p_lower := sum(log2fc_diff < 0), by = feats] + random_perms[, p_lower := 1 - (p_lower / n_perm)] + + ## combine results permutation and original + random_perms_res <- unique(random_perms[ + , .(feats, perm_sel, perm_other, perm_log2fc, perm_diff, p_higher, + p_lower)]) + results_m <- data.table::merge.data.table( + random_perms_res, original[, .(feats, sel, other, log2fc, diff)], + by = "feats") + + # select lowest p-value and perform p.adj + results_m[, p.value := ifelse(p_higher <= p_lower, p_higher, p_lower)] + results_m[, p.adj := stats::p.adjust(p.value, method = adjust_method)] + + results_m <- results_m[ + , .(feats, sel, other, log2fc, diff, p.value, p.adj, perm_sel, + perm_other, perm_log2fc, perm_diff)] + setorder(results_m, p.adj, -log2fc) + + return(results_m) } @@ -703,55 +785,62 @@ NULL #' @param expr_values Matrix object #' @keywords internal #' @seealso [cell_proximity_tests] -.do_cell_proximity_test = function(expr_values, - select_ind, other_ind, - diff_test = c('permutation', 'limma', 't.test', 'wilcox'), - mean_method = c('arithmic', 'geometric'), - offset = 0.1, - n_perm = 100, - adjust_method = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none"), - set_seed = TRUE, - seed_number = 1234, - verbose = FALSE) { - - # get parameters - diff_test = match.arg(diff_test, choices = c('permutation', 'limma', 't.test', 'wilcox')) - adjust_method = match.arg(adjust_method, choices = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none")) - mean_method = match.arg(mean_method, choices = c('arithmic', 'geometric')) - - vmsg(.is_debug = TRUE, ".do_cell_proximity_test") - - if(diff_test == 'permutation') { - result = .do_permuttest(expr_values = expr_values, - select_ind = select_ind, other_ind = other_ind, - n_perm = n_perm, adjust_method = adjust_method, - mean_method = mean_method, offset = offset, - set_seed = set_seed, - seed_number = seed_number) - - } else if(diff_test == 'limma') { - result = .do_limmatest(expr_values = expr_values, - select_ind = select_ind, other_ind = other_ind, - mean_method = mean_method, offset = offset) - - } else if(diff_test == 't.test') { - result = .do_ttest(expr_values = expr_values, - select_ind = select_ind, other_ind = other_ind, - mean_method = mean_method, offset = offset, - adjust_method = adjust_method) - - } else if(diff_test == 'wilcox') { - result = .do_wilctest(expr_values = expr_values, - select_ind = select_ind, other_ind = other_ind, - mean_method = mean_method, offset = offset, - adjust_method = adjust_method) - - } - - return(result) +.do_cell_proximity_test <- function(expr_values, + select_ind, other_ind, + diff_test = c("permutation", "limma", "t.test", "wilcox"), + mean_method = c("arithmic", "geometric"), + offset = 0.1, + n_perm = 100, + adjust_method = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + ), + set_seed = TRUE, + seed_number = 1234, + verbose = FALSE) { + # get parameters + diff_test <- match.arg( + diff_test, choices = c("permutation", "limma", "t.test", "wilcox")) + adjust_method <- match.arg(adjust_method, choices = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + )) + mean_method <- match.arg(mean_method, choices = c("arithmic", "geometric")) + + vmsg(.is_debug = TRUE, ".do_cell_proximity_test") + + if (diff_test == "permutation") { + result <- .do_permuttest( + expr_values = expr_values, + select_ind = select_ind, other_ind = other_ind, + n_perm = n_perm, adjust_method = adjust_method, + mean_method = mean_method, offset = offset, + set_seed = set_seed, + seed_number = seed_number + ) + } else if (diff_test == "limma") { + result <- .do_limmatest( + expr_values = expr_values, + select_ind = select_ind, other_ind = other_ind, + mean_method = mean_method, offset = offset + ) + } else if (diff_test == "t.test") { + result <- .do_ttest( + expr_values = expr_values, + select_ind = select_ind, other_ind = other_ind, + mean_method = mean_method, offset = offset, + adjust_method = adjust_method + ) + } else if (diff_test == "wilcox") { + result <- .do_wilctest( + expr_values = expr_values, + select_ind = select_ind, other_ind = other_ind, + mean_method = mean_method, offset = offset, + adjust_method = adjust_method + ) + } + return(result) } @@ -759,185 +848,189 @@ NULL #' @title Find cell proximity features per interaction #' @name .findCellProximityFeats_per_interaction -#' @description Identifies features that are differentially expressed due to proximity to other cell types. +#' @description Identifies features that are differentially expressed due to +#' proximity to other cell types. #' @param expr_values Matrix object #' @keywords internal #' @seealso [.do_cell_proximity_test()] for specific tests -.findCellProximityFeats_per_interaction = function(sel_int, - expr_values, - cell_metadata, - annot_spatnetwork, - cluster_column = NULL, - minimum_unique_cells = 1, - minimum_unique_int_cells = 1, - exclude_selected_cells_from_test = T, - diff_test = c('permutation', 'limma', 't.test', 'wilcox'), - mean_method = c('arithmic', 'geometric'), - offset = 0.1, - adjust_method = 'bonferroni', - nr_permutations = 100, - set_seed = TRUE, - seed_number = 1234) { - - - # data.table variables - unified_int = to_cell_type = from_cell_type = cell_type = int_cell_type = NULL - nr_select = int_nr_select = nr_other = int_nr_other = unif_int = NULL - - # select test to perform - diff_test = match.arg(arg = diff_test, choices = c('permutation', 'limma', 't.test', 'wilcox')) - - # select subnetwork - sub_spatnetwork = annot_spatnetwork[unified_int == sel_int] - - # unique cell types - unique_cell_types = unique(c(sub_spatnetwork$to_cell_type, sub_spatnetwork$from_cell_type)) - - if(length(unique_cell_types) == 2) { - - first_cell_type = unique_cell_types[1] - second_cell_type = unique_cell_types[2] - - # first cell type ids - to1 = sub_spatnetwork[to_cell_type == first_cell_type][['to']] - from1 = sub_spatnetwork[from_cell_type == first_cell_type][['from']] - cell1_ids = unique(c(to1, from1)) - - # second cell type ids - to2 = sub_spatnetwork[to_cell_type == second_cell_type][['to']] - from2 = sub_spatnetwork[from_cell_type == second_cell_type][['from']] - cell2_ids = unique(c(to2, from2)) - - ## all cell ids - all_cell1 = cell_metadata[get(cluster_column) == first_cell_type][['cell_ID']] - all_cell2 = cell_metadata[get(cluster_column) == second_cell_type][['cell_ID']] - - ## exclude selected - if(exclude_selected_cells_from_test == TRUE) { - all_cell1 = all_cell1[!all_cell1 %in% cell1_ids] - all_cell2 = all_cell2[!all_cell2 %in% cell2_ids] - } - - ## FOR CELL TYPE 1 - sel_ind1 = which(colnames(expr_values) %in% cell1_ids) - all_ind1 = which(colnames(expr_values) %in% all_cell1) - - ## FOR CELL TYPE 2 - sel_ind2 = which(colnames(expr_values) %in% cell2_ids) - all_ind2 = which(colnames(expr_values) %in% all_cell2) - - - ## do not continue if too few cells ## - if(length(sel_ind1) < minimum_unique_cells | length(all_ind1) < minimum_unique_cells | - length(sel_ind2) < minimum_unique_int_cells) { - result_cell_1 = NULL - } else { - - result_cell_1 = .do_cell_proximity_test(expr_values = expr_values, - select_ind = sel_ind1, - other_ind = all_ind1, - diff_test = diff_test, - n_perm = nr_permutations, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - set_seed = set_seed, - seed_number = seed_number) - result_cell_1[, cell_type := first_cell_type] - result_cell_1[, int_cell_type := second_cell_type] - result_cell_1[, nr_select := length(sel_ind1)] - result_cell_1[, int_nr_select := length(sel_ind2)] - result_cell_1[, nr_other := length(all_ind1)] - result_cell_1[, int_nr_other := length(all_ind2)] - - } - - - ## do not continue if too few cells ## - if(length(sel_ind2) < minimum_unique_cells | length(all_ind2) < minimum_unique_cells | - length(sel_ind1) < minimum_unique_int_cells) { - result_cell_2 = NULL - } else { - - result_cell_2 = .do_cell_proximity_test(expr_values = expr_values, - select_ind = sel_ind2, other_ind = all_ind2, - diff_test = diff_test, - n_perm = nr_permutations, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - set_seed = set_seed, - seed_number = seed_number) - result_cell_2[, cell_type := second_cell_type] - result_cell_2[, int_cell_type := first_cell_type] - result_cell_2[, nr_select := length(sel_ind2)] - result_cell_2[, int_nr_select := length(sel_ind1)] - result_cell_2[, nr_other := length(all_ind2)] - result_cell_2[, int_nr_other := length(all_ind1)] - - } - - - ## COMBINE - - if(is.null(result_cell_1) & is.null(result_cell_2)) { - return(NULL) - } else { - result_cells = rbind(result_cell_1, result_cell_2) - } - - } else if(length(unique_cell_types) == 1) { - - first_cell_type = unique_cell_types[1] - - # first cell type ids - to1 = sub_spatnetwork[to_cell_type == first_cell_type][['to']] - from1 = sub_spatnetwork[from_cell_type == first_cell_type][['from']] - cell1_ids = unique(c(to1, from1)) - - ## all cell ids - all_cell1 = cell_metadata[get(cluster_column) == first_cell_type][['cell_ID']] - - ## exclude selected - if(exclude_selected_cells_from_test == TRUE) { - all_cell1 = all_cell1[!all_cell1 %in% cell1_ids] +.findCellProximityFeats_per_interaction <- function(sel_int, + expr_values, + cell_metadata, + annot_spatnetwork, + cluster_column = NULL, + minimum_unique_cells = 1, + minimum_unique_int_cells = 1, + exclude_selected_cells_from_test = TRUE, + diff_test = c("permutation", "limma", "t.test", "wilcox"), + mean_method = c("arithmic", "geometric"), + offset = 0.1, + adjust_method = "bonferroni", + nr_permutations = 100, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + unified_int <- to_cell_type <- from_cell_type <- cell_type <- + int_cell_type <- NULL + nr_select <- int_nr_select <- nr_other <- int_nr_other <- unif_int <- NULL + + # select test to perform + diff_test <- match.arg( + arg = diff_test, + choices = c("permutation", "limma", "t.test", "wilcox")) + + # select subnetwork + sub_spatnetwork <- annot_spatnetwork[unified_int == sel_int] + + # unique cell types + unique_cell_types <- unique( + c(sub_spatnetwork$to_cell_type, sub_spatnetwork$from_cell_type)) + + if (length(unique_cell_types) == 2) { + first_cell_type <- unique_cell_types[1] + second_cell_type <- unique_cell_types[2] + + # first cell type ids + to1 <- sub_spatnetwork[to_cell_type == first_cell_type][["to"]] + from1 <- sub_spatnetwork[from_cell_type == first_cell_type][["from"]] + cell1_ids <- unique(c(to1, from1)) + + # second cell type ids + to2 <- sub_spatnetwork[to_cell_type == second_cell_type][["to"]] + from2 <- sub_spatnetwork[from_cell_type == second_cell_type][["from"]] + cell2_ids <- unique(c(to2, from2)) + + ## all cell ids + all_cell1 <- cell_metadata[get(cluster_column) == first_cell_type][[ + "cell_ID"]] + all_cell2 <- cell_metadata[get(cluster_column) == second_cell_type][[ + "cell_ID"]] + + ## exclude selected + if (exclude_selected_cells_from_test == TRUE) { + all_cell1 <- all_cell1[!all_cell1 %in% cell1_ids] + all_cell2 <- all_cell2[!all_cell2 %in% cell2_ids] + } + + ## FOR CELL TYPE 1 + sel_ind1 <- which(colnames(expr_values) %in% cell1_ids) + all_ind1 <- which(colnames(expr_values) %in% all_cell1) + + ## FOR CELL TYPE 2 + sel_ind2 <- which(colnames(expr_values) %in% cell2_ids) + all_ind2 <- which(colnames(expr_values) %in% all_cell2) + + + ## do not continue if too few cells ## + if (length(sel_ind1) < minimum_unique_cells | + length(all_ind1) < minimum_unique_cells | + length(sel_ind2) < minimum_unique_int_cells) { + result_cell_1 <- NULL + } else { + result_cell_1 <- .do_cell_proximity_test( + expr_values = expr_values, + select_ind = sel_ind1, + other_ind = all_ind1, + diff_test = diff_test, + n_perm = nr_permutations, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + set_seed = set_seed, + seed_number = seed_number + ) + result_cell_1[, cell_type := first_cell_type] + result_cell_1[, int_cell_type := second_cell_type] + result_cell_1[, nr_select := length(sel_ind1)] + result_cell_1[, int_nr_select := length(sel_ind2)] + result_cell_1[, nr_other := length(all_ind1)] + result_cell_1[, int_nr_other := length(all_ind2)] + } + + + ## do not continue if too few cells ## + if (length(sel_ind2) < minimum_unique_cells | + length(all_ind2) < minimum_unique_cells | + length(sel_ind1) < minimum_unique_int_cells) { + result_cell_2 <- NULL + } else { + result_cell_2 <- .do_cell_proximity_test( + expr_values = expr_values, + select_ind = sel_ind2, other_ind = all_ind2, + diff_test = diff_test, + n_perm = nr_permutations, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + set_seed = set_seed, + seed_number = seed_number + ) + result_cell_2[, cell_type := second_cell_type] + result_cell_2[, int_cell_type := first_cell_type] + result_cell_2[, nr_select := length(sel_ind2)] + result_cell_2[, int_nr_select := length(sel_ind1)] + result_cell_2[, nr_other := length(all_ind2)] + result_cell_2[, int_nr_other := length(all_ind1)] + } + + + ## COMBINE + + if (is.null(result_cell_1) & is.null(result_cell_2)) { + return(NULL) + } else { + result_cells <- rbind(result_cell_1, result_cell_2) + } + } else if (length(unique_cell_types) == 1) { + first_cell_type <- unique_cell_types[1] + + # first cell type ids + to1 <- sub_spatnetwork[to_cell_type == first_cell_type][["to"]] + from1 <- sub_spatnetwork[from_cell_type == first_cell_type][["from"]] + cell1_ids <- unique(c(to1, from1)) + + ## all cell ids + all_cell1 <- cell_metadata[get(cluster_column) == first_cell_type][[ + "cell_ID"]] + + ## exclude selected + if (exclude_selected_cells_from_test == TRUE) { + all_cell1 <- all_cell1[!all_cell1 %in% cell1_ids] + } + + ## FOR CELL TYPE 1 + sel_ind1 <- which(colnames(expr_values) %in% cell1_ids) + all_ind1 <- which(colnames(expr_values) %in% all_cell1) + + + ## do not continue if too few cells ## + if (length(sel_ind1) < minimum_unique_cells | + length(all_ind1) < minimum_unique_cells) { + return(NULL) + } + + result_cells <- .do_cell_proximity_test( + expr_values = expr_values, + select_ind = sel_ind1, other_ind = all_ind1, + diff_test = diff_test, + n_perm = nr_permutations, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + set_seed = set_seed, + seed_number = seed_number + ) + + result_cells[, cell_type := first_cell_type] + result_cells[, int_cell_type := first_cell_type] + result_cells[, nr_select := length(sel_ind1)] + result_cells[, int_nr_select := length(sel_ind1)] + result_cells[, nr_other := length(all_ind1)] + result_cells[, int_nr_other := length(all_ind1)] } - ## FOR CELL TYPE 1 - sel_ind1 = which(colnames(expr_values) %in% cell1_ids) - all_ind1 = which(colnames(expr_values) %in% all_cell1) - - - ## do not continue if too few cells ## - if(length(sel_ind1) < minimum_unique_cells | length(all_ind1) < minimum_unique_cells) { - return(NULL) - } - - #print('second') - - result_cells = .do_cell_proximity_test(expr_values = expr_values, - select_ind = sel_ind1, other_ind = all_ind1, - diff_test = diff_test, - n_perm = nr_permutations, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - set_seed = set_seed, - seed_number = seed_number) - - result_cells[, cell_type := first_cell_type] - result_cells[, int_cell_type := first_cell_type] - result_cells[, nr_select := length(sel_ind1)] - result_cells[, int_nr_select := length(sel_ind1)] - result_cells[, nr_other := length(all_ind1)] - result_cells[, int_nr_other := length(all_ind1)] - - } - - result_cells[, unif_int := sel_int] - - return(result_cells) + result_cells[, unif_int := sel_int] + return(result_cells) } @@ -946,7 +1039,8 @@ NULL #' @title findInteractionChangedFeats #' @name findInteractionChangedFeats #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. features that are differentially expressed due to proximity to other cell types.#' +#' i.e. features that are differentially expressed due to proximity to other +#' cell types. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -965,10 +1059,12 @@ NULL #' @param do_parallel run calculations in parallel with mclapply #' @param set_seed set a seed for reproducibility #' @param seed_number seed number -#' @return icfObject that contains the Interaction Changed differential feature scores -#' @details Function to calculate if features are differentially expressed in cell types -#' when they interact (approximated by physical proximity) with other cell types. -#' The results data.table in the icfObject contains - at least - the following columns: +#' @return icfObject that contains the Interaction Changed differential feature +#' scores +#' @details Function to calculate if features are differentially expressed in +#' cell types when they interact (approximated by physical proximity) with +#' other cell types. The results data.table in the icfObject contains +#' - at least - the following columns: #' \itemize{ #' \item{features:}{ All or selected list of tested features} #' \item{sel:}{ average feature expression in the interacting cells from the target cell type } @@ -986,162 +1082,170 @@ NULL #' \item{unif_int:}{ cell-cell interaction} #' } #' @export -findInteractionChangedFeats = function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = 'normalized', - selected_feats = NULL, - cluster_column, - spatial_network_name = 'Delaunay_network', - minimum_unique_cells = 1, - minimum_unique_int_cells = 1, - diff_test = c('permutation', 'limma', 't.test', 'wilcox'), - mean_method = c('arithmic', 'geometric'), - offset = 0.1, - adjust_method = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none"), - nr_permutations = 1000, - exclude_selected_cells_from_test = T, - do_parallel = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') # as matrix (never set back) - - ## test selected feats ## - if(!is.null(selected_feats)) { - expr_values = expr_values[rownames(expr_values) %in% selected_feats, ] - } - - ## stop test selected feats ## - - # difference test - diff_test = match.arg(diff_test, choices = c('permutation', 'limma', 't.test', 'wilcox')) - - # p.adj test - adjust_method = match.arg(adjust_method, choices = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none")) - # how to calculate mean - mean_method = match.arg(mean_method, choices = c('arithmic', 'geometric')) - - ## metadata - cell_metadata = pDataDT(gobject, feat_type = feat_type) - - - - ## annotated spatial network - annot_spatnetwork = annotateSpatialNetwork(gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - # print(annot_spatnetwork) - all_interactions = unique(annot_spatnetwork$unified_int) - # print(all_interactions) - - - if(do_parallel == TRUE) { - - - fin_result = lapply_flex(X = all_interactions, future.seed=TRUE, FUN = function(x) { - - #print('first') - #print(x) - - tempres = .findCellProximityFeats_per_interaction(expr_values = expr_values, - cell_metadata = cell_metadata, - annot_spatnetwork = annot_spatnetwork, - minimum_unique_cells = minimum_unique_cells, - minimum_unique_int_cells = minimum_unique_int_cells, - sel_int = x, - cluster_column = cluster_column, - exclude_selected_cells_from_test = exclude_selected_cells_from_test, - diff_test = diff_test, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - nr_permutations = nr_permutations, - set_seed = set_seed, - seed_number = seed_number) - - - }) - +findInteractionChangedFeats <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + expression_values = "normalized", + selected_feats = NULL, + cluster_column, + spatial_network_name = "Delaunay_network", + minimum_unique_cells = 1, + minimum_unique_int_cells = 1, + diff_test = c("permutation", "limma", "t.test", "wilcox"), + mean_method = c("arithmic", "geometric"), + offset = 0.1, + adjust_method = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + ), + nr_permutations = 1000, + exclude_selected_cells_from_test = TRUE, + do_parallel = TRUE, + set_seed = TRUE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } else { + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) # as matrix (never set back) + + ## test selected feats ## + if (!is.null(selected_feats)) { + expr_values <- expr_values[rownames(expr_values) %in% selected_feats, ] + } - fin_result = list() + ## stop test selected feats ## - for(i in seq_along(all_interactions)) { + # difference test + diff_test <- match.arg( + diff_test, choices = c("permutation", "limma", "t.test", "wilcox")) - x = all_interactions[i] + # p.adj test + adjust_method <- match.arg(adjust_method, choices = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + )) + # how to calculate mean + mean_method <- match.arg(mean_method, choices = c("arithmic", "geometric")) + ## metadata + cell_metadata <- pDataDT(gobject, feat_type = feat_type) - tempres = .findCellProximityFeats_per_interaction(expr_values = expr_values, - cell_metadata = cell_metadata, - annot_spatnetwork = annot_spatnetwork, - minimum_unique_cells = minimum_unique_cells, - minimum_unique_int_cells = minimum_unique_int_cells, - sel_int = x, - cluster_column = cluster_column, - exclude_selected_cells_from_test = exclude_selected_cells_from_test, - diff_test = diff_test, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - nr_permutations = nr_permutations, - set_seed = set_seed, - seed_number = seed_number) - fin_result[[i]] = tempres + ## annotated spatial network + annot_spatnetwork <- annotateSpatialNetwork(gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) + + all_interactions <- unique(annot_spatnetwork$unified_int) + + if (do_parallel == TRUE) { + fin_result <- lapply_flex( + X = all_interactions, future.seed = TRUE, FUN = function(x) { + + tempres <- .findCellProximityFeats_per_interaction( + expr_values = expr_values, + cell_metadata = cell_metadata, + annot_spatnetwork = annot_spatnetwork, + minimum_unique_cells = minimum_unique_cells, + minimum_unique_int_cells = minimum_unique_int_cells, + sel_int = x, + cluster_column = cluster_column, + exclude_selected_cells_from_test = exclude_selected_cells_from_test, + diff_test = diff_test, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + nr_permutations = nr_permutations, + set_seed = set_seed, + seed_number = seed_number + ) + }) + } else { + fin_result <- list() + + for (i in seq_along(all_interactions)) { + x <- all_interactions[i] + + + tempres <- .findCellProximityFeats_per_interaction( + expr_values = expr_values, + cell_metadata = cell_metadata, + annot_spatnetwork = annot_spatnetwork, + minimum_unique_cells = minimum_unique_cells, + minimum_unique_int_cells = minimum_unique_int_cells, + sel_int = x, + cluster_column = cluster_column, + exclude_selected_cells_from_test = exclude_selected_cells_from_test, + diff_test = diff_test, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + nr_permutations = nr_permutations, + set_seed = set_seed, + seed_number = seed_number + ) + + fin_result[[i]] <- tempres + } } - - } - - final_result = do.call('rbind', fin_result) + final_result <- do.call("rbind", fin_result) - # data.table variables - spec_int = cell_type = int_cell_type = type_int = NULL + # data.table variables + spec_int <- cell_type <- int_cell_type <- type_int <- NULL - final_result[, spec_int := paste0(cell_type,'--',int_cell_type)] - final_result[, type_int := ifelse(cell_type == int_cell_type, 'homo', 'hetero')] + final_result[, spec_int := paste0(cell_type, "--", int_cell_type)] + final_result[, type_int := ifelse( + cell_type == int_cell_type, "homo", "hetero")] - #return(final_result) + # return(final_result) - permutation_test = ifelse(diff_test == 'permutation', nr_permutations, 'no permutations') - - icfObject = list(ICFscores = final_result, - Giotto_info = list('values' = values, - 'cluster' = cluster_column, - 'spatial network' = spatial_network_name), - test_info = list('test' = diff_test, - 'p.adj' = adjust_method, - 'min cells' = minimum_unique_cells, - 'min interacting cells' = minimum_unique_int_cells, - 'exclude selected cells' = exclude_selected_cells_from_test, - 'perm' = permutation_test)) - class(icfObject) = append('icfObject' ,class(icfObject)) - return(icfObject) + permutation_test <- ifelse( + diff_test == "permutation", nr_permutations, "no permutations") + icfObject <- list( + ICFscores = final_result, + Giotto_info = list( + "values" = values, + "cluster" = cluster_column, + "spatial network" = spatial_network_name + ), + test_info = list( + "test" = diff_test, + "p.adj" = adjust_method, + "min cells" = minimum_unique_cells, + "min interacting cells" = minimum_unique_int_cells, + "exclude selected cells" = exclude_selected_cells_from_test, + "perm" = permutation_test + ) + ) + class(icfObject) <- append("icfObject", class(icfObject)) + return(icfObject) } @@ -1149,16 +1253,15 @@ findInteractionChangedFeats = function(gobject, #' @title findInteractionChangedGenes #' @name findInteractionChangedGenes #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. genes that are differentially expressed due to interactions with other cell types. +#' i.e. genes that are differentially expressed due to interactions with other +#' cell types. #' @param ... params to pass to \code{findInteractionChangedFeats} #' @seealso \code{\link{findInteractionChangedFeats}} #' @export -findInteractionChangedGenes = function(...) { - - .Deprecated(new = "findInteractionChangedFeats") - - findInteractionChangedFeats(...) +findInteractionChangedGenes <- function(...) { + .Deprecated(new = "findInteractionChangedFeats") + findInteractionChangedFeats(...) } @@ -1166,16 +1269,15 @@ findInteractionChangedGenes = function(...) { #' @title findCellProximityGenes #' @name findCellProximityGenes #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. genes that are differentially expressed due to proximity to other cell types. +#' i.e. genes that are differentially expressed due to proximity to other cell +#' types. #' @inheritDotParams findInteractionChangedFeats #' @seealso \code{\link{findInteractionChangedFeats}} #' @export findCellProximityGenes <- function(...) { + .Deprecated(new = "findInteractionChangedFeats") - .Deprecated(new = "findInteractionChangedFeats") - - findInteractionChangedFeats(...) - + findInteractionChangedFeats(...) } @@ -1185,7 +1287,8 @@ findCellProximityGenes <- function(...) { #' @title findICF #' @name findICF #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. features that are differentially expressed due to proximity to other cell types.#' +#' i.e. features that are differentially expressed due to proximity to other +#' cell types. #' @param gobject giotto object #' @param feat_type feature type #' @param spat_unit spatial unit @@ -1204,10 +1307,12 @@ findCellProximityGenes <- function(...) { #' @param do_parallel run calculations in parallel with mclapply #' @param set_seed set a seed for reproducibility #' @param seed_number seed number -#' @return icfObject that contains the Interaction Changed differential gene scores -#' @details Function to calculate if genes are differentially expressed in cell types -#' when they interact (approximated by physical proximity) with other cell types. -#' The results data.table in the icfObject contains - at least - the following columns: +#' @return icfObject that contains the Interaction Changed differential gene +#' scores +#' @details Function to calculate if genes are differentially expressed in +#' cell types when they interact (approximated by physical proximity) with +#' other cell types. The results data.table in the icfObject contains +#' - at least - the following columns: #' \itemize{ #' \item{features:}{ All or selected list of tested features} #' \item{sel:}{ average feature expression in the interacting cells from the target cell type } @@ -1226,47 +1331,47 @@ findCellProximityGenes <- function(...) { #' } #' @seealso \code{\link{findInteractionChangedFeats}} #' @export -findICF = function(gobject, - feat_type = NULL, - spat_unit = NULL, - expression_values = 'normalized', - selected_feats = NULL, - cluster_column, - spatial_network_name = 'Delaunay_network', - minimum_unique_cells = 1, - minimum_unique_int_cells = 1, - diff_test = c('permutation', 'limma', 't.test', 'wilcox'), - mean_method = c('arithmic', 'geometric'), - offset = 0.1, - adjust_method = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none"), - nr_permutations = 100, - exclude_selected_cells_from_test = T, - do_parallel = TRUE, - set_seed = TRUE, - seed_number = 1234) { - - - findInteractionChangedFeats(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - expression_values = expression_values, - selected_feats = selected_feats, - cluster_column = cluster_column, - spatial_network_name = spatial_network_name, - minimum_unique_cells = minimum_unique_cells, - minimum_unique_int_cells = minimum_unique_int_cells, - diff_test = diff_test, - mean_method = mean_method, - offset = offset, - adjust_method = adjust_method, - nr_permutations = nr_permutations, - exclude_selected_cells_from_test = exclude_selected_cells_from_test, - do_parallel = do_parallel, - set_seed = set_seed, - seed_number = seed_number) - - +findICF <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + expression_values = "normalized", + selected_feats = NULL, + cluster_column, + spatial_network_name = "Delaunay_network", + minimum_unique_cells = 1, + minimum_unique_int_cells = 1, + diff_test = c("permutation", "limma", "t.test", "wilcox"), + mean_method = c("arithmic", "geometric"), + offset = 0.1, + adjust_method = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + ), + nr_permutations = 100, + exclude_selected_cells_from_test = TRUE, + do_parallel = TRUE, + set_seed = TRUE, + seed_number = 1234) { + findInteractionChangedFeats( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + expression_values = expression_values, + selected_feats = selected_feats, + cluster_column = cluster_column, + spatial_network_name = spatial_network_name, + minimum_unique_cells = minimum_unique_cells, + minimum_unique_int_cells = minimum_unique_int_cells, + diff_test = diff_test, + mean_method = mean_method, + offset = offset, + adjust_method = adjust_method, + nr_permutations = nr_permutations, + exclude_selected_cells_from_test = exclude_selected_cells_from_test, + do_parallel = do_parallel, + set_seed = set_seed, + seed_number = seed_number + ) } @@ -1275,16 +1380,15 @@ findICF = function(gobject, #' @title findICG #' @name findICG #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. genes that are differentially expressed due to interaction with other cell types. +#' i.e. genes that are differentially expressed due to interaction with other +#' cell types. #' @inheritDotParams findICF #' @seealso \code{\link{findICF}} #' @export -findICG = function(...) { - - .Deprecated(new = "findICF") - - findICF(...) +findICG <- function(...) { + .Deprecated(new = "findICF") + findICF(...) } @@ -1292,16 +1396,15 @@ findICG = function(...) { #' @title findCPG #' @name findCPG #' @description Identifies cell-to-cell Interaction Changed Features (ICF), -#' i.e. genes that are differentially expressed due to proximity to other cell types. +#' i.e. genes that are differentially expressed due to proximity to other cell +#' types. #' @inheritDotParams findICF #' @seealso \code{\link{findICF}} #' @export findCPG <- function(...) { + .Deprecated(new = "findICF") - .Deprecated(new = "findICF") - - findICF(...) - + findICF(...) } @@ -1314,7 +1417,8 @@ findCPG <- function(...) { #' @param min_cells minimum number of source cell type #' @param min_cells_expr minimum expression level for source cell type #' @param min_int_cells minimum number of interacting neighbor cell type -#' @param min_int_cells_expr minimum expression level for interacting neighbor cell type +#' @param min_int_cells_expr minimum expression level for interacting neighbor +#' cell type #' @param min_fdr minimum adjusted p-value #' @param min_spat_diff minimum absolute spatial expression difference #' @param min_log2_fc minimum log2 fold-change @@ -1323,62 +1427,71 @@ findCPG <- function(...) { #' @param direction differential expression directions to keep #' @return icfObject that contains the filtered differential feature scores #' @export -filterInteractionChangedFeats = function(icfObject, - min_cells = 4, - min_cells_expr = 1, - min_int_cells = 4, - min_int_cells_expr = 1, - min_fdr = 0.1, - min_spat_diff = 0.2, - min_log2_fc = 0.2, - min_zscore = 2, - zscores_column = c('cell_type', 'feats'), - direction = c('both', 'up', 'down')) { - - # data.table variables - nr_select = int_nr_select = zscores = log2fc = sel = other = p.adj = NULL - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - zscores_column = match.arg(zscores_column, choices = c('cell_type', 'feats')) +filterInteractionChangedFeats <- function(icfObject, + min_cells = 4, + min_cells_expr = 1, + min_int_cells = 4, + min_int_cells_expr = 1, + min_fdr = 0.1, + min_spat_diff = 0.2, + min_log2_fc = 0.2, + min_zscore = 2, + zscores_column = c("cell_type", "feats"), + direction = c("both", "up", "down")) { + # data.table variables + nr_select <- int_nr_select <- zscores <- log2fc <- sel <- other <- + p.adj <- NULL + + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") + } - ICFscore = copy(icfObject[['ICFscores']]) + zscores_column <- match.arg( + zscores_column, choices = c("cell_type", "feats")) - # other parameters - direction = match.arg(direction, choices = c('both', 'up', 'down')) + ICFscore <- copy(icfObject[["ICFscores"]]) + # other parameters + direction <- match.arg(direction, choices = c("both", "up", "down")) - ## sequential filter steps ## - # 1. minimum number of source and target cells - selection_scores = ICFscore[nr_select >= min_cells & int_nr_select >= min_int_cells] - # 2. create z-scores for log2fc per cell type - selection_scores[, zscores := scale(log2fc), by = c(zscores_column)] + ## sequential filter steps ## + # 1. minimum number of source and target cells + selection_scores <- ICFscore[nr_select >= min_cells & + int_nr_select >= min_int_cells] - # 3. filter based on z-scores and minimum levels - comb_DT = rbind(selection_scores[zscores >= min_zscore & abs(diff) >= min_spat_diff & log2fc >= min_log2_fc & sel >= min_cells_expr], - selection_scores[zscores <= -min_zscore & abs(diff) >= min_spat_diff & log2fc <= -min_log2_fc & other >= min_int_cells_expr]) + # 2. create z-scores for log2fc per cell type + selection_scores[, zscores := scale(log2fc), by = c(zscores_column)] - # 4. filter based on adjusted p-value (fdr) - comb_DT = comb_DT[p.adj < min_fdr] + # 3. filter based on z-scores and minimum levels + comb_DT <- rbind( + selection_scores[zscores >= min_zscore & + abs(diff) >= min_spat_diff & + log2fc >= min_log2_fc & sel >= min_cells_expr], + selection_scores[zscores <= -min_zscore & + abs(diff) >= min_spat_diff & + log2fc <= -min_log2_fc & + other >= min_int_cells_expr] + ) + # 4. filter based on adjusted p-value (fdr) + comb_DT <- comb_DT[p.adj < min_fdr] - if(direction == 'both') { - selection_scores = selection_scores - } else if(direction == 'up') { - selection_scores = selection_scores[log2fc >= min_log2_fc] - } else if(direction == 'down') { - selection_scores = selection_scores[log2fc <= -min_log2_fc] - } + if (direction == "both") { + selection_scores <- selection_scores + } else if (direction == "up") { + selection_scores <- selection_scores[log2fc >= min_log2_fc] + } else if (direction == "down") { + selection_scores <- selection_scores[log2fc <= -min_log2_fc] + } - newobj = copy(icfObject) - newobj[['ICFscores']] = comb_DT - return(newobj) + newobj <- copy(icfObject) + newobj[["ICFscores"]] <- comb_DT + return(newobj) } @@ -1388,12 +1501,10 @@ filterInteractionChangedFeats = function(icfObject, #' @inheritDotParams filterInteractionChangedFeats #' @seealso \code{\link{filterInteractionChangedFeats}} #' @export -filterInteractionChangedGenes = function(...) { - - .Deprecated(new = "filterInteractionChangedFeats") - - filterInteractionChangedFeats(...) +filterInteractionChangedGenes <- function(...) { + .Deprecated(new = "filterInteractionChangedFeats") + filterInteractionChangedFeats(...) } @@ -1404,11 +1515,9 @@ filterInteractionChangedGenes = function(...) { #' @seealso \code{\link{filterInteractionChangedFeats}} #' @export filterCellProximityGenes <- function(...) { + .Deprecated(new = "filterInteractionChangedFeats") - .Deprecated(new = "filterInteractionChangedFeats") - - filterInteractionChangedFeats(...) - + filterInteractionChangedFeats(...) } @@ -1422,7 +1531,8 @@ filterCellProximityGenes <- function(...) { #' @param min_cells minimum number of source cell type #' @param min_cells_expr minimum expression level for source cell type #' @param min_int_cells minimum number of interacting neighbor cell type -#' @param min_int_cells_expr minimum expression level for interacting neighbor cell type +#' @param min_int_cells_expr minimum expression level for interacting neighbor +#' cell type #' @param min_fdr minimum adjusted p-value #' @param min_spat_diff minimum absolute spatial expression difference #' @param min_log2_fc minimum log2 fold-change @@ -1431,30 +1541,30 @@ filterCellProximityGenes <- function(...) { #' @param direction differential expression directions to keep #' @return icfObject that contains the filtered differential feature scores #' @export -filterICF = function(icfObject, - min_cells = 4, - min_cells_expr = 1, - min_int_cells = 4, - min_int_cells_expr = 1, - min_fdr = 0.1, - min_spat_diff = 0.2, - min_log2_fc = 0.2, - min_zscore = 2, - zscores_column = c('cell_type', 'feats'), - direction = c('both', 'up', 'down')) { - - filterInteractionChangedFeats(icfObject = icfObject, - min_cells = min_cells, - min_cells_expr = min_cells_expr, - min_int_cells = min_int_cells, - min_int_cells_expr = min_int_cells_expr, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc, - min_zscore = min_zscore, - zscores_column = zscores_column, - direction = direction) - +filterICF <- function(icfObject, + min_cells = 4, + min_cells_expr = 1, + min_int_cells = 4, + min_int_cells_expr = 1, + min_fdr = 0.1, + min_spat_diff = 0.2, + min_log2_fc = 0.2, + min_zscore = 2, + zscores_column = c("cell_type", "feats"), + direction = c("both", "up", "down")) { + filterInteractionChangedFeats( + icfObject = icfObject, + min_cells = min_cells, + min_cells_expr = min_cells_expr, + min_int_cells = min_int_cells, + min_int_cells_expr = min_int_cells_expr, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc, + min_zscore = min_zscore, + zscores_column = zscores_column, + direction = direction + ) } @@ -1466,12 +1576,10 @@ filterICF = function(icfObject, #' @inheritDotParams filterICF #' @seealso \code{\link{filterICF}} #' @export -filterICG = function(...) { - - .Deprecated(new = "filterICF") - - filterICF(...) +filterICG <- function(...) { + .Deprecated(new = "filterICF") + filterICF(...) } @@ -1483,11 +1591,9 @@ filterICG = function(...) { #' @seealso \code{\link{filterICF}} #' @export filterCPG <- function(...) { + .Deprecated(new = "filterICF") - .Deprecated(new = "filterICF") - - filterICF(...) - + filterICF(...) } @@ -1500,274 +1606,338 @@ filterCPG <- function(...) { #' @name .combineInteractionChangedFeatures_per_interaction #' @description Combine ICF scores per interaction #' @keywords internal -.combineInteractionChangedFeatures_per_interaction = function(icfObject, - sel_int, - selected_feats = NULL, - specific_feats_1 = NULL, - specific_feats_2 = NULL, - min_cells = 5, - min_int_cells = 3, - min_fdr = 0.05, - min_spat_diff = 0, - min_log2_fc = 0.5) { - - # data.table variables - unif_int = feats = cell_type = p.adj = nr_select = int_nr_select = log2fc = sel = NULL - other = p.value = perm_sel = perm_other = perm_log2fc = perm_diff = NULL - int_cell_type = nr_other = feats_combo = feats_1 = feats_2 = type_int = NULL - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - ICFscore = copy(icfObject[['ICFscores']]) - - test_used = icfObject[['test_info']][['test']] - - ## subset on selected interaction - subset = ICFscore[unif_int == sel_int] - - ## type of interactions - type_interaction = unique(subset[['type_int']]) - - ## first filtering ICFscores on feats - if((!is.null(specific_feats_1) & !is.null(specific_feats_2))) { - if(length(specific_feats_1) != length(specific_feats_2)) { - stop('\n specific_feats_1 must have the same length as specific_feats_2') +.combineInteractionChangedFeatures_per_interaction <- function(icfObject, + sel_int, + selected_feats = NULL, + specific_feats_1 = NULL, + specific_feats_2 = NULL, + min_cells = 5, + min_int_cells = 3, + min_fdr = 0.05, + min_spat_diff = 0, + min_log2_fc = 0.5) { + # data.table variables + unif_int <- feats <- cell_type <- p.adj <- nr_select <- + int_nr_select <- log2fc <- sel <- NULL + other <- p.value <- perm_sel <- perm_other <- perm_log2fc <- + perm_diff <- NULL + int_cell_type <- nr_other <- feats_combo <- feats_1 <- feats_2 <- + type_int <- NULL + + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") } - subset = subset[feats %in% c(specific_feats_1, specific_feats_2)] - } else if(!is.null(selected_feats)) { - subset = subset[feats %in% c(selected_feats)] - } - - ## find number of unique cell types - unique_cell_types = unique(c(subset$cell_type, subset$int_cell_type)) - - if(length(unique_cell_types) == 2) { - - ## CELL TYPE 1 - subset_cell_1 = subset[cell_type == unique_cell_types[1]] - - if(nrow(subset_cell_1) == 0) { - - if(test_used == 'permutation') { - - subset_cell_1 = data.table::data.table('feats_1' = subset[['feats']], - 'sel_1' = NA, - 'other_1' = NA, - 'log2fc_1' = NA, - 'diff_1' = NA, - 'p.value_1' = NA, - 'p.adj_1' = NA, - 'perm_sel_1' = NA, - 'perm_other_1' = NA, - 'perm_log2fc_1' = NA, - 'perm_diff_1' = NA, - 'cell_type_1' = unique_cell_types[1], - 'int_cell_type_1' = unique_cell_types[2], - 'nr_select_1' = NA, - 'nr_other_1' = NA, - 'unif_int' = subset[['unif_int']]) - - } else { - - subset_cell_1 = data.table::data.table('feats_1' = subset[['feats']], - 'sel_1' = NA, - 'other_1' = NA, - 'log2fc_1' = NA, - 'diff_1' = NA, - 'p.value_1' = NA, - 'p.adj_1' = NA, - 'cell_type_1' = unique_cell_types[1], - 'int_cell_type_1' = unique_cell_types[2], - 'nr_select_1' = NA, - 'nr_other_1' = NA, - 'unif_int' = subset[['unif_int']]) - } + ICFscore <- copy(icfObject[["ICFscores"]]) - } else { + test_used <- icfObject[["test_info"]][["test"]] + ## subset on selected interaction + subset <- ICFscore[unif_int == sel_int] - # filter on statistics - subset_cell_1 = subset_cell_1[p.adj <= min_fdr] - subset_cell_1 = subset_cell_1[nr_select >= min_cells] - subset_cell_1 = subset_cell_1[int_nr_select >= min_int_cells] - subset_cell_1 = subset_cell_1[abs(log2fc) >= min_log2_fc] - subset_cell_1 = subset_cell_1[abs(diff) >= min_spat_diff] - - if(test_used == 'permutation') { - # make it specific - subset_cell_1 = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, - perm_sel, perm_other, perm_log2fc, perm_diff, - cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', - 'perm_sel', 'perm_other', 'perm_log2fc', 'perm_diff', - 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_1', 'sel_1', 'other_1', 'log2fc_1', 'diff_1', 'p.value_1', 'p.adj_1', - 'perm_sel_1', 'perm_other_1', 'perm_log2fc_1', 'perm_diff_1', - 'cell_type_1', 'int_cell_type_1', 'nr_select_1', 'nr_other_1')) - - } else { - # make it specific - subset_cell_1 = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_1', 'sel_1', 'other_1', 'log2fc_1', 'diff_1', 'p.value_1', 'p.adj_1', 'cell_type_1', 'int_cell_type_1', 'nr_select_1', 'nr_other_1')) - - } - + ## type of interactions + type_interaction <- unique(subset[["type_int"]]) + ## first filtering ICFscores on feats + if ((!is.null(specific_feats_1) & !is.null(specific_feats_2))) { + if (length(specific_feats_1) != length(specific_feats_2)) { + stop("specific_feats_1 must have the same length as + specific_feats_2") + } + subset <- subset[feats %in% c(specific_feats_1, specific_feats_2)] + } else if (!is.null(selected_feats)) { + subset <- subset[feats %in% c(selected_feats)] } + ## find number of unique cell types + unique_cell_types <- unique(c(subset$cell_type, subset$int_cell_type)) + + if (length(unique_cell_types) == 2) { + ## CELL TYPE 1 + subset_cell_1 <- subset[cell_type == unique_cell_types[1]] + + if (nrow(subset_cell_1) == 0) { + if (test_used == "permutation") { + subset_cell_1 <- data.table::data.table( + "feats_1" = subset[["feats"]], + "sel_1" = NA, + "other_1" = NA, + "log2fc_1" = NA, + "diff_1" = NA, + "p.value_1" = NA, + "p.adj_1" = NA, + "perm_sel_1" = NA, + "perm_other_1" = NA, + "perm_log2fc_1" = NA, + "perm_diff_1" = NA, + "cell_type_1" = unique_cell_types[1], + "int_cell_type_1" = unique_cell_types[2], + "nr_select_1" = NA, + "nr_other_1" = NA, + "unif_int" = subset[["unif_int"]] + ) + } else { + subset_cell_1 <- data.table::data.table( + "feats_1" = subset[["feats"]], + "sel_1" = NA, + "other_1" = NA, + "log2fc_1" = NA, + "diff_1" = NA, + "p.value_1" = NA, + "p.adj_1" = NA, + "cell_type_1" = unique_cell_types[1], + "int_cell_type_1" = unique_cell_types[2], + "nr_select_1" = NA, + "nr_other_1" = NA, + "unif_int" = subset[["unif_int"]] + ) + } + } else { + # filter on statistics + subset_cell_1 <- subset_cell_1[p.adj <= min_fdr] + subset_cell_1 <- subset_cell_1[nr_select >= min_cells] + subset_cell_1 <- subset_cell_1[int_nr_select >= min_int_cells] + subset_cell_1 <- subset_cell_1[abs(log2fc) >= min_log2_fc] + subset_cell_1 <- subset_cell_1[abs(diff) >= min_spat_diff] + + if (test_used == "permutation") { + # make it specific + subset_cell_1 <- subset_cell_1[, .( + feats, sel, other, log2fc, diff, p.value, p.adj, + perm_sel, perm_other, perm_log2fc, perm_diff, + cell_type, int_cell_type, nr_select, nr_other, unif_int + )] + data.table::setnames(subset_cell_1, + old = c( + "feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", + "perm_sel", "perm_other", "perm_log2fc", "perm_diff", + "cell_type", "int_cell_type", "nr_select", "nr_other" + ), + new = c( + "feats_1", "sel_1", "other_1", "log2fc_1", "diff_1", + "p.value_1", "p.adj_1", + "perm_sel_1", "perm_other_1", "perm_log2fc_1", + "perm_diff_1", + "cell_type_1", "int_cell_type_1", "nr_select_1", + "nr_other_1" + ) + ) + } else { + # make it specific + subset_cell_1 <- subset_cell_1[ + , .(feats, sel, other, log2fc, diff, p.value, p.adj, + cell_type, int_cell_type, nr_select, nr_other, + unif_int)] + data.table::setnames(subset_cell_1, + old = c("feats", "sel", "other", "log2fc", "diff", + "p.value", "p.adj", "cell_type", "int_cell_type", + "nr_select", "nr_other"), + new = c("feats_1", "sel_1", "other_1", "log2fc_1", + "diff_1", "p.value_1", "p.adj_1", "cell_type_1", + "int_cell_type_1", "nr_select_1", "nr_other_1") + ) + } + } + + + + + ## CELL TYPE 2 + subset_cell_2 <- subset[cell_type == unique_cell_types[2]] + + if (nrow(subset_cell_2) == 0) { + if (test_used == "permutation") { + subset_cell_2 <- data.table::data.table( + "feats_2" = subset[["feats"]], + "sel_2" = NA, + "other_2" = NA, + "log2fc_2" = NA, + "diff_2" = NA, + "p.value_2" = NA, + "p.adj_2" = NA, + "perm_sel_2" = NA, + "perm_other_2" = NA, + "perm_log2fc_2" = NA, + "perm_diff_2" = NA, + "cell_type_2" = unique_cell_types[2], + "int_cell_type_2" = unique_cell_types[1], + "nr_select_2" = NA, + "nr_other_2" = NA, + "unif_int" = subset[["unif_int"]] + ) + } else { + subset_cell_2 <- data.table::data.table( + "feats_2" = subset[["feats"]], + "sel_2" = NA, + "other_2" = NA, + "log2fc_2" = NA, + "diff_2" = NA, + "p.value_2" = NA, + "p.adj_2" = NA, + "cell_type_2" = unique_cell_types[2], + "int_cell_type_2" = unique_cell_types[1], + "nr_select_2" = NA, + "nr_other_2" = NA, + "unif_int" = subset[["unif_int"]] + ) + } + } else { + # filter on statistics + subset_cell_2 <- subset_cell_2[p.adj <= min_fdr] + subset_cell_2 <- subset_cell_2[nr_select >= min_cells] + subset_cell_2 <- subset_cell_2[int_nr_select >= min_int_cells] + subset_cell_2 <- subset_cell_2[abs(log2fc) >= min_log2_fc] + subset_cell_2 <- subset_cell_2[abs(diff) >= min_spat_diff] + + if (test_used == "permutation") { + subset_cell_2 <- subset_cell_2[, .( + feats, sel, other, log2fc, diff, p.value, p.adj, + perm_sel, perm_other, perm_log2fc, perm_diff, + cell_type, int_cell_type, nr_select, nr_other, unif_int + )] + data.table::setnames(subset_cell_2, + old = c( + "feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", + "perm_sel", "perm_other", "perm_log2fc", "perm_diff", + "cell_type", "int_cell_type", "nr_select", "nr_other" + ), + new = c( + "feats_2", "sel_2", "other_2", "log2fc_2", "diff_2", + "p.value_2", "p.adj_2", + "perm_sel_2", "perm_other_2", "perm_log2fc_2", + "perm_diff_2", + "cell_type_2", "int_cell_type_2", "nr_select_2", + "nr_other_2" + ) + ) + } else { + subset_cell_2 <- subset_cell_2[ + , .(feats, sel, other, log2fc, diff, p.value, p.adj, + cell_type, int_cell_type, nr_select, nr_other, + unif_int)] + data.table::setnames(subset_cell_2, + old = c("feats", "sel", "other", "log2fc", "diff", + "p.value", "p.adj", "cell_type", "int_cell_type", + "nr_select", "nr_other"), + new = c("feats_2", "sel_2", "other_2", "log2fc_2", + "diff_2", "p.value_2", "p.adj_2", "cell_type_2", + "int_cell_type_2", "nr_select_2", "nr_other_2") + ) + } + } + + merge_subsets <- data.table::merge.data.table( + subset_cell_1, subset_cell_2, by = c("unif_int"), + allow.cartesian = TRUE) + } else if (length(unique_cell_types) == 1) { + ## CELL TYPE 1 + subset_cell_1 <- subset[cell_type == unique_cell_types[1]] + + # filter on statistics + subset_cell_1 <- subset_cell_1[p.adj <= min_fdr] + subset_cell_1 <- subset_cell_1[nr_select >= min_cells] + subset_cell_1 <- subset_cell_1[int_nr_select >= min_int_cells] + subset_cell_1 <- subset_cell_1[abs(log2fc) >= min_log2_fc] + subset_cell_1 <- subset_cell_1[abs(diff) >= min_spat_diff] + # make it specific - - ## CELL TYPE 2 - subset_cell_2 = subset[cell_type == unique_cell_types[2]] - - if(nrow(subset_cell_2) == 0) { - - if(test_used == 'permutation') { - - subset_cell_2 = data.table::data.table('feats_2' = subset[['feats']], - 'sel_2' = NA, - 'other_2' = NA, - 'log2fc_2' = NA, - 'diff_2' = NA, - 'p.value_2' = NA, - 'p.adj_2' = NA, - 'perm_sel_2' = NA, - 'perm_other_2' = NA, - 'perm_log2fc_2' = NA, - 'perm_diff_2' = NA, - 'cell_type_2' = unique_cell_types[2], - 'int_cell_type_2' = unique_cell_types[1], - 'nr_select_2' = NA, - 'nr_other_2' = NA, - 'unif_int' = subset[['unif_int']]) - } else { - - subset_cell_2 = data.table::data.table('feats_2' = subset[['feats']], - 'sel_2' = NA, - 'other_2' = NA, - 'log2fc_2' = NA, - 'diff_2' = NA, - 'p.value_2' = NA, - 'p.adj_2' = NA, - 'cell_type_2' = unique_cell_types[2], - 'int_cell_type_2' = unique_cell_types[1], - 'nr_select_2' = NA, - 'nr_other_2' = NA, - 'unif_int' = subset[['unif_int']]) - } - - - - } else { - - # filter on statistics - subset_cell_2 = subset_cell_2[p.adj <= min_fdr] - subset_cell_2 = subset_cell_2[nr_select >= min_cells] - subset_cell_2 = subset_cell_2[int_nr_select >= min_int_cells] - subset_cell_2 = subset_cell_2[abs(log2fc) >= min_log2_fc] - subset_cell_2 = subset_cell_2[abs(diff) >= min_spat_diff] - - if(test_used == 'permutation') { - - subset_cell_2 = subset_cell_2[,.(feats, sel, other, log2fc, diff, p.value, p.adj, - perm_sel, perm_other, perm_log2fc, perm_diff, - cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_2, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', - 'perm_sel', 'perm_other', 'perm_log2fc', 'perm_diff', - 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_2', 'sel_2', 'other_2', 'log2fc_2', 'diff_2', 'p.value_2', 'p.adj_2', - 'perm_sel_2', 'perm_other_2', 'perm_log2fc_2', 'perm_diff_2', - 'cell_type_2', 'int_cell_type_2', 'nr_select_2', 'nr_other_2')) - - - } else { - subset_cell_2 = subset_cell_2[,.(feats, sel, other, log2fc, diff, p.value, p.adj, cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_2, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_2', 'sel_2', 'other_2', 'log2fc_2', 'diff_2', 'p.value_2', 'p.adj_2', 'cell_type_2', 'int_cell_type_2', 'nr_select_2', 'nr_other_2')) - - } - - - - } - - merge_subsets = data.table::merge.data.table(subset_cell_1, subset_cell_2, by = c('unif_int'), allow.cartesian = TRUE) - - } else if(length(unique_cell_types) == 1) { - - ## CELL TYPE 1 - subset_cell_1 = subset[cell_type == unique_cell_types[1]] - - # filter on statistics - subset_cell_1 = subset_cell_1[p.adj <= min_fdr] - subset_cell_1 = subset_cell_1[nr_select >= min_cells] - subset_cell_1 = subset_cell_1[int_nr_select >= min_int_cells] - subset_cell_1 = subset_cell_1[abs(log2fc) >= min_log2_fc] - subset_cell_1 = subset_cell_1[abs(diff) >= min_spat_diff] - - # make it specific - - if(test_used == 'permutation') { - subset_cell_1A = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, - perm_sel, perm_other, perm_log2fc, perm_diff, - cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1A, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', - 'perm_sel', 'perm_other', 'perm_log2fc', 'perm_diff', - 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_1', 'sel_1', 'other_1', 'log2fc_1', 'diff_1', 'p.value_1', 'p.adj_1', - 'perm_sel_1', 'perm_other_1', 'perm_log2fc_1', 'perm_diff_1', - 'cell_type_1', 'int_cell_type_1', 'nr_select_1', 'nr_other_1')) - - } else { - subset_cell_1A = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1A, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_1', 'sel_1', 'other_1', 'log2fc_1', 'diff_1', 'p.value_1', 'p.adj_1', 'cell_type_1', 'int_cell_type_1', 'nr_select_1', 'nr_other_1')) - + if (test_used == "permutation") { + subset_cell_1A <- subset_cell_1[, .( + feats, sel, other, log2fc, diff, p.value, p.adj, + perm_sel, perm_other, perm_log2fc, perm_diff, + cell_type, int_cell_type, nr_select, nr_other, unif_int + )] + data.table::setnames(subset_cell_1A, + old = c( + "feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", + "perm_sel", "perm_other", "perm_log2fc", "perm_diff", + "cell_type", "int_cell_type", "nr_select", "nr_other" + ), + new = c( + "feats_1", "sel_1", "other_1", "log2fc_1", "diff_1", + "p.value_1", "p.adj_1", + "perm_sel_1", "perm_other_1", "perm_log2fc_1", + "perm_diff_1", + "cell_type_1", "int_cell_type_1", "nr_select_1", + "nr_other_1" + ) + ) + } else { + subset_cell_1A <- subset_cell_1[ + , .(feats, sel, other, log2fc, diff, p.value, p.adj, + cell_type, int_cell_type, nr_select, nr_other, unif_int)] + data.table::setnames(subset_cell_1A, + old = c("feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", "cell_type", "int_cell_type", "nr_select", + "nr_other"), + new = c("feats_1", "sel_1", "other_1", "log2fc_1", "diff_1", + "p.value_1", "p.adj_1", "cell_type_1", + "int_cell_type_1", "nr_select_1", "nr_other_1") + ) + } + + + ## CELL TYPE 2 + + if (test_used == "permutation") { + subset_cell_1B <- subset_cell_1[, .( + feats, sel, other, log2fc, diff, p.value, p.adj, + perm_sel, perm_other, perm_log2fc, perm_diff, + cell_type, int_cell_type, nr_select, nr_other, unif_int + )] + data.table::setnames(subset_cell_1B, + old = c( + "feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", + "perm_sel", "perm_other", "perm_log2fc", "perm_diff", + "cell_type", "int_cell_type", "nr_select", "nr_other" + ), + new = c( + "feats_2", "sel_2", "other_2", "log2fc_2", "diff_2", + "p.value_2", "p.adj_2", + "perm_sel_2", "perm_other_2", "perm_log2fc_2", + "perm_diff_2", + "cell_type_2", "int_cell_type_2", "nr_select_2", + "nr_other_2" + ) + ) + } else { + subset_cell_1B <- subset_cell_1[ + , .(feats, sel, other, log2fc, diff, p.value, p.adj, + cell_type, int_cell_type, nr_select, nr_other, unif_int)] + data.table::setnames(subset_cell_1B, + old = c("feats", "sel", "other", "log2fc", "diff", "p.value", + "p.adj", "cell_type", "int_cell_type", "nr_select", + "nr_other"), + new = c("feats_2", "sel_2", "other_2", "log2fc_2", "diff_2", + "p.value_2", "p.adj_2", "cell_type_2", + "int_cell_type_2", "nr_select_2", "nr_other_2") + ) + } + + merge_subsets <- data.table::merge.data.table( + subset_cell_1A, subset_cell_1B, by = c("unif_int"), + allow.cartesian = TRUE) } - - ## CELL TYPE 2 - - if(test_used == 'permutation') { - subset_cell_1B = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, - perm_sel, perm_other, perm_log2fc, perm_diff, - cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1B, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', - 'perm_sel', 'perm_other', 'perm_log2fc', 'perm_diff', - 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_2', 'sel_2', 'other_2', 'log2fc_2', 'diff_2', 'p.value_2', 'p.adj_2', - 'perm_sel_2', 'perm_other_2', 'perm_log2fc_2', 'perm_diff_2', - 'cell_type_2', 'int_cell_type_2', 'nr_select_2', 'nr_other_2')) - - } else { - subset_cell_1B = subset_cell_1[,.(feats, sel, other, log2fc, diff, p.value, p.adj, cell_type, int_cell_type, nr_select, nr_other, unif_int)] - data.table::setnames(subset_cell_1B, old = c('feats', 'sel', 'other', 'log2fc', 'diff', 'p.value', 'p.adj', 'cell_type', 'int_cell_type', 'nr_select', 'nr_other'), - new = c('feats_2', 'sel_2', 'other_2', 'log2fc_2', 'diff_2', 'p.value_2', 'p.adj_2', 'cell_type_2', 'int_cell_type_2', 'nr_select_2', 'nr_other_2')) - + # restrict to feature combinations if needed + if ((!is.null(specific_feats_1) & !is.null(specific_feats_2))) { + merge_subsets[, feats_combo := paste0(feats_1, "--", feats_2)] + all_combos <- c( + paste0(specific_feats_1, "--", specific_feats_2), + paste0(specific_feats_2, "--", specific_feats_1) + ) + merge_subsets <- merge_subsets[feats_combo %in% all_combos] + merge_subsets[, feats_combo := NULL] } - merge_subsets = data.table::merge.data.table(subset_cell_1A, subset_cell_1B, by = c('unif_int'), allow.cartesian = TRUE) - - - } - - # restrict to feature combinations if needed - if((!is.null(specific_feats_1) & !is.null(specific_feats_2))) { - merge_subsets[, feats_combo := paste0(feats_1,'--',feats_2)] - all_combos = c(paste0(specific_feats_1,'--', specific_feats_2), - paste0(specific_feats_2,'--', specific_feats_1)) - merge_subsets = merge_subsets[feats_combo %in% all_combos] - merge_subsets[, feats_combo := NULL] - } - - merge_subsets[, type_int := type_interaction] - return(merge_subsets) - + merge_subsets[, type_int := type_interaction] + return(merge_subsets) } @@ -1777,8 +1947,10 @@ filterCPG <- function(...) { #' @param icfObject ICF (interaction changed feat) score object #' @param selected_ints subset of selected cell-cell interactions (optional) #' @param selected_feats subset of selected Features (optional) -#' @param specific_feats_1 specific Featureset combo (need to position match specific_feats_2) -#' @param specific_feats_2 specific Featureset combo (need to position match specific_feats_1) +#' @param specific_feats_1 specific Featureset combo +#' (need to position match specific_feats_2) +#' @param specific_feats_2 specific Featureset combo +#' (need to position match specific_feats_1) #' @param min_cells minimum number of target cell type #' @param min_int_cells minimum number of interacting cell type #' @param min_fdr minimum adjusted p-value @@ -1788,105 +1960,111 @@ filterCPG <- function(...) { #' @param verbose verbose #' @return combIcfObject that contains the filtered differential feature scores #' @export -combineInteractionChangedFeats = function(icfObject, - selected_ints = NULL, - selected_feats = NULL, - specific_feats_1 = NULL, - specific_feats_2 = NULL, - min_cells = 5, - min_int_cells = 3, - min_fdr = 0.05, - min_spat_diff = 0, - min_log2_fc = 0.5, - do_parallel = TRUE, - verbose = T) { - - # data.table variables - unif_int = feat1_feat2 = feats_1 = feats_2 = comb_logfc = log2fc_1 = log2fc_2 = direction = NULL - - ## check validity - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - ICFscore = copy(icfObject[['ICFscores']]) - - if(!is.null(selected_ints)) { - ICFscore = ICFscore[unif_int %in% selected_ints] - } - - all_ints = unique(ICFscore[['unif_int']]) - - # parallel - if(do_parallel == TRUE) { - - FTFresults = lapply_flex(X = all_ints, FUN = function(x) { - - tempres = .combineInteractionChangedFeatures_per_interaction(icfObject = icfObject, - sel_int = x, - selected_feats = selected_feats, - specific_feats_1 = specific_feats_1, - specific_feats_2 = specific_feats_2, - min_cells = min_cells, - min_int_cells = min_int_cells, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc) - - }) - - - } else { - # for loop - FTFresults = list() - - for(i in seq_along(all_ints)) { - - x = all_ints[[i]] - - if(verbose == TRUE) print(x) - - tempres = .combineInteractionChangedFeatures_per_interaction(icfObject = icfObject, - sel_int = x, - selected_feats = selected_feats, - specific_feats_1 = specific_feats_1, - specific_feats_2 = specific_feats_2, - min_cells = min_cells, - min_int_cells = min_int_cells, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc) - FTFresults[[i]] = tempres +combineInteractionChangedFeats <- function(icfObject, + selected_ints = NULL, + selected_feats = NULL, + specific_feats_1 = NULL, + specific_feats_2 = NULL, + min_cells = 5, + min_int_cells = 3, + min_fdr = 0.05, + min_spat_diff = 0, + min_log2_fc = 0.5, + do_parallel = TRUE, + verbose = TRUE) { + # data.table variables + unif_int <- feat1_feat2 <- feats_1 <- feats_2 <- comb_logfc <- + log2fc_1 <- log2fc_2 <- direction <- NULL + + ## check validity + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") } + ICFscore <- copy(icfObject[["ICFscores"]]) - } - - final_results = do.call('rbind', FTFresults) - - final_results[, feat1_feat2 := paste0(feats_1,'--',feats_2)] + if (!is.null(selected_ints)) { + ICFscore <- ICFscore[unif_int %in% selected_ints] + } - final_results = dt_sort_combine_two_columns(final_results, - column1 = 'feats_1', column2 = 'feats_2', - myname = 'unif_feat_feat') + all_ints <- unique(ICFscore[["unif_int"]]) + + # parallel + if (do_parallel == TRUE) { + FTFresults <- lapply_flex(X = all_ints, FUN = function(x) { + tempres <- .combineInteractionChangedFeatures_per_interaction( + icfObject = icfObject, + sel_int = x, + selected_feats = selected_feats, + specific_feats_1 = specific_feats_1, + specific_feats_2 = specific_feats_2, + min_cells = min_cells, + min_int_cells = min_int_cells, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc + ) + }) + } else { + # for loop + FTFresults <- list() + + for (i in seq_along(all_ints)) { + x <- all_ints[[i]] + + if (verbose == TRUE) print(x) + + tempres <- .combineInteractionChangedFeatures_per_interaction( + icfObject = icfObject, + sel_int = x, + selected_feats = selected_feats, + specific_feats_1 = specific_feats_1, + specific_feats_2 = specific_feats_2, + min_cells = min_cells, + min_int_cells = min_int_cells, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc + ) + FTFresults[[i]] <- tempres + } + } - final_results[, comb_logfc := abs(log2fc_1) + abs(log2fc_2)] - setorder(final_results, -comb_logfc) - final_results[, direction := ifelse(log2fc_1 > 0 & log2fc_2 > 0, 'both_up', - ifelse(log2fc_1 < 0 & log2fc_2 < 0, 'both_down', 'mixed'))] + final_results <- do.call("rbind", FTFresults) - combIcfObject = list(combICFscores = final_results, - Giotto_info = list('values' = icfObject[['Giotto_info']][['values']], - 'cluster' = icfObject[['Giotto_info']][['cluster']], - 'spatial network' = icfObject[['Giotto_info']][['spatial network']]), - test_info = list('test' = icfObject[['test_info']][['test']], - 'p.adj' = icfObject[['test_info']][['p.adj']], - 'min cells' = icfObject[['test_info']][['min cells']], - 'min interacting cells' = icfObject[['test_info']][['min interacting cells']], - 'exclude selected cells' = icfObject[['test_info']][['exclude selected cells']], - 'perm' = icfObject[['test_info']][['perm']])) - class(combIcfObject) = append(class(combIcfObject), 'combIcfObject') - return(combIcfObject) + final_results[, feat1_feat2 := paste0(feats_1, "--", feats_2)] + final_results <- dt_sort_combine_two_columns(final_results, + column1 = "feats_1", column2 = "feats_2", + myname = "unif_feat_feat" + ) + final_results[, comb_logfc := abs(log2fc_1) + abs(log2fc_2)] + setorder(final_results, -comb_logfc) + final_results[, direction := ifelse(log2fc_1 > 0 & log2fc_2 > 0, "both_up", + ifelse(log2fc_1 < 0 & log2fc_2 < 0, "both_down", "mixed") + )] + + combIcfObject <- list( + combICFscores = final_results, + Giotto_info = list( + "values" = icfObject[["Giotto_info"]][["values"]], + "cluster" = icfObject[["Giotto_info"]][["cluster"]], + "spatial network" = icfObject[["Giotto_info"]][["spatial network"]] + ), + test_info = list( + "test" = icfObject[["test_info"]][["test"]], + "p.adj" = icfObject[["test_info"]][["p.adj"]], + "min cells" = icfObject[["test_info"]][["min cells"]], + "min interacting cells" = icfObject[["test_info"]][[ + "min interacting cells"]], + "exclude selected cells" = icfObject[["test_info"]][[ + "exclude selected cells"]], + "perm" = icfObject[["test_info"]][["perm"]] + ) + ) + class(combIcfObject) <- append(class(combIcfObject), "combIcfObject") + return(combIcfObject) } @@ -1896,12 +2074,10 @@ combineInteractionChangedFeats = function(icfObject, #' @inheritDotParams combineInteractionChangedFeats #' @seealso \code{\link{combineInteractionChangedFeats}} #' @export -combineInteractionChangedGenes = function(...) { - - .Deprecated(new = "combineInteractionChangedFeats") - - combineInteractionChangedFeats(...) +combineInteractionChangedGenes <- function(...) { + .Deprecated(new = "combineInteractionChangedFeats") + combineInteractionChangedFeats(...) } @@ -1912,11 +2088,9 @@ combineInteractionChangedGenes = function(...) { #' @seealso \code{\link{combineInteractionChangedFeats}} #' @export combineCellProximityGenes <- function(...) { + .Deprecated(new = "combineInteractionChangedFeats") - .Deprecated(new = "combineInteractionChangedFeats") - - combineInteractionChangedFeats(...) - + combineInteractionChangedFeats(...) } #' @title combineICF @@ -1925,8 +2099,10 @@ combineCellProximityGenes <- function(...) { #' @param icfObject ICF (interaction changed feat) score object #' @param selected_ints subset of selected cell-cell interactions (optional) #' @param selected_feats subset of selected Feats (optional) -#' @param specific_feats_1 specific Featset combo (need to position match specific_genes_2) -#' @param specific_feats_2 specific Featset combo (need to position match specific_genes_1) +#' @param specific_feats_1 specific Featset combo +#' (need to position match specific_genes_2) +#' @param specific_feats_2 specific Featset combo +#' (need to position match specific_genes_1) #' @param min_cells minimum number of target cell type #' @param min_int_cells minimum number of interacting cell type #' @param min_fdr minimum adjusted p-value @@ -1937,33 +2113,32 @@ combineCellProximityGenes <- function(...) { #' @return icfObject that contains the filtered differential feats scores #' @export combineICF <- function(icfObject, - selected_ints = NULL, - selected_feats = NULL, - specific_feats_1 = NULL, - specific_feats_2 = NULL, - min_cells = 5, - min_int_cells = 3, - min_fdr = 0.05, - min_spat_diff = 0, - min_log2_fc = 0.5, - do_parallel = TRUE, - verbose = T){ - - combineInteractionChangedFeats(icfObject = icfObject, - selected_ints = selected_ints, - selected_feats = selected_feats, - specific_feats_1 = specific_feats_1, - specific_feats_2 = specific_feats_2, - min_cells = min_cells, - min_int_cells = min_int_cells, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc, - do_parallel = do_parallel, - verbose = verbose) - - - } + selected_ints = NULL, + selected_feats = NULL, + specific_feats_1 = NULL, + specific_feats_2 = NULL, + min_cells = 5, + min_int_cells = 3, + min_fdr = 0.05, + min_spat_diff = 0, + min_log2_fc = 0.5, + do_parallel = TRUE, + verbose = TRUE) { + combineInteractionChangedFeats( + icfObject = icfObject, + selected_ints = selected_ints, + selected_feats = selected_feats, + specific_feats_1 = specific_feats_1, + specific_feats_2 = specific_feats_2, + min_cells = min_cells, + min_int_cells = min_int_cells, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc, + do_parallel = do_parallel, + verbose = verbose + ) +} #' @title combineICG @@ -1972,13 +2147,10 @@ combineICF <- function(icfObject, #' @inheritDotParams combineICF #' @seealso \code{\link{combineICF}} #' @export -combineICG = function(...) { - - .Deprecated(new = "combineICF") - - combineICF(...) - +combineICG <- function(...) { + .Deprecated(new = "combineICF") + combineICF(...) } #' @title combineCPG @@ -1988,11 +2160,9 @@ combineICG = function(...) { #' @seealso \code{\link{combineICF}} #' @export combineCPG <- function(...) { + .Deprecated(new = "combineICF") - .Deprecated(new = "combineICF") - - combineICF(...) - + combineICF(...) } @@ -2013,75 +2183,91 @@ combineCPG <- function(...) { #' @param feat_set_2 second specific feat set from feat pairs #' @return data.table with average expression scores for each cluster #' @keywords internal -average_feat_feat_expression_in_groups = function(gobject, - spat_unit = NULL, - feat_type = NULL, - cluster_column = 'cell_types', - feat_set_1, - feat_set_2) { - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - average_DT = create_average_DT(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - meta_data_name = cluster_column) - - # change column names back to original - new_colnames = gsub(pattern = 'cluster_', replacement = '', colnames(average_DT)) - colnames(average_DT) = new_colnames - - # keep order of colnames - colnames_order = sort(new_colnames) - - # feat_set_1 and feat_set_2 need to have same length and all feats need to be present in data - if(length(feat_set_1) != length(feat_set_2)) { - stop('\n length of set1 needs to be the same as that of set2 \n') - } - - if(!all(c(feat_set_1, feat_set_2) %in% rownames(average_DT) == T)) { - stop('\n all selected feats from set 1 and 2 need to be present \n') - } - - LR_pairs = paste0(feat_set_1,'-',feat_set_2) - - # get ligand and receptor information - ligand_match = average_DT[match(feat_set_1, rownames(average_DT)), ,drop = F] - receptor_match = average_DT[match(feat_set_2, rownames(average_DT)), ,drop = F] - - # data.table variables - ligand = LR_comb = receptor = LR_expr = lig_expr = rec_expr = lig_cell_type = rec_cell_type = NULL - - all_ligand_cols = colnames(ligand_match) - lig_test = data.table::as.data.table(reshape2::melt(ligand_match, measure.vars = all_ligand_cols)) - lig_test[, ligand := rep(rownames(ligand_match), ncol(ligand_match))] - lig_test[, ligand := strsplit(ligand,'\\.')[[1]][1] , by = 1:nrow(lig_test)] - lig_test[, LR_comb := rep(LR_pairs, ncol(ligand_match))] - setnames(lig_test, 'value', 'lig_expr') - setnames(lig_test, 'variable', 'lig_cell_type') - - all_receptor_cols = colnames(receptor_match) - rec_test = data.table::as.data.table(reshape2::melt(receptor_match, measure.vars = all_receptor_cols)) - rec_test[, receptor := rep(rownames(receptor_match), ncol(receptor_match))] - rec_test[, receptor := strsplit(receptor,'\\.')[[1]][1] , by = 1:nrow(rec_test)] - rec_test[, LR_comb := rep(LR_pairs, ncol(receptor_match))] - setnames(rec_test, 'value', 'rec_expr') - setnames(rec_test, 'variable', 'rec_cell_type') - - lig_rec_test = merge(lig_test, rec_test, by = 'LR_comb', allow.cartesian = T) - lig_rec_test[, LR_expr := lig_expr+rec_expr] - - - lig_rec_test[, lig_cell_type := factor(lig_cell_type, levels = colnames_order)] - lig_rec_test[, rec_cell_type := factor(rec_cell_type, levels = colnames_order)] - setorder(lig_rec_test, LR_comb, lig_cell_type, rec_cell_type) - - return(lig_rec_test) +average_feat_feat_expression_in_groups <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + cluster_column = "cell_types", + feat_set_1, + feat_set_2) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + average_DT <- create_average_DT( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + meta_data_name = cluster_column + ) + + # change column names back to original + new_colnames <- gsub( + pattern = "cluster_", replacement = "", colnames(average_DT)) + colnames(average_DT) <- new_colnames + + # keep order of colnames + colnames_order <- sort(new_colnames) + + # feat_set_1 and feat_set_2 need to have same length and all feats need + # to be present in data + if (length(feat_set_1) != length(feat_set_2)) { + stop("length of set1 needs to be the same as that of set2") + } + if (!all(c(feat_set_1, feat_set_2) %in% rownames(average_DT) == TRUE)) { + stop("all selected feats from set 1 and 2 need to be present") + } + + LR_pairs <- paste0(feat_set_1, "-", feat_set_2) + + # get ligand and receptor information + ligand_match <- average_DT[ + match(feat_set_1, rownames(average_DT)), , drop = FALSE] + receptor_match <- average_DT[ + match(feat_set_2, rownames(average_DT)), , drop = FALSE] + + # data.table variables + ligand <- LR_comb <- receptor <- LR_expr <- lig_expr <- rec_expr <- + lig_cell_type <- rec_cell_type <- NULL + + all_ligand_cols <- colnames(ligand_match) + lig_test <- data.table::as.data.table( + reshape2::melt(ligand_match, measure.vars = all_ligand_cols)) + lig_test[, ligand := rep(rownames(ligand_match), ncol(ligand_match))] + lig_test[, ligand := strsplit(ligand, "\\.")[[1]][1], by = 1:nrow(lig_test)] + lig_test[, LR_comb := rep(LR_pairs, ncol(ligand_match))] + setnames(lig_test, "value", "lig_expr") + setnames(lig_test, "variable", "lig_cell_type") + + all_receptor_cols <- colnames(receptor_match) + rec_test <- data.table::as.data.table(reshape2::melt( + receptor_match, measure.vars = all_receptor_cols)) + rec_test[, receptor := rep(rownames(receptor_match), ncol(receptor_match))] + rec_test[, receptor := strsplit( + receptor, "\\.")[[1]][1], by = 1:nrow(rec_test)] + rec_test[, LR_comb := rep(LR_pairs, ncol(receptor_match))] + setnames(rec_test, "value", "rec_expr") + setnames(rec_test, "variable", "rec_cell_type") + + lig_rec_test <- merge( + lig_test, rec_test, by = "LR_comb", allow.cartesian = TRUE) + lig_rec_test[, LR_expr := lig_expr + rec_expr] + + + lig_rec_test[, lig_cell_type := factor( + lig_cell_type, levels = colnames_order)] + lig_rec_test[, rec_cell_type := factor( + rec_cell_type, levels = colnames_order)] + setorder(lig_rec_test, LR_comb, lig_cell_type, rec_cell_type) + + return(lig_rec_test) } @@ -2096,179 +2282,192 @@ average_feat_feat_expression_in_groups = function(gobject, #' @param feat_set_1 first specific feature set from feature pairs #' @param feat_set_2 second specific feature set from feature pairs #' @param log2FC_addendum addendum to add when calculating log2FC -#' @param detailed provide more detailed information (random variance and z-score) +#' @param detailed provide more detailed information +#' (random variance and z-score) #' @param adjust_method which method to adjust p-values #' @param adjust_target adjust multiple hypotheses at the cell or feature level #' @param set_seed set seed for random simulations (default = TRUE) #' @param seed_number seed number #' @param verbose verbose -#' @return Cell-Cell communication scores for feature pairs based on expression only -#' @details Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -#' are expressed at higher levels than expected based on a reshuffled null distribution of feature expression values, +#' @return Cell-Cell communication scores for feature pairs based on +#' expression only +#' @details Statistical framework to identify if pairs of features +#' (such as ligand-receptor combinations) +#' are expressed at higher levels than expected based on a reshuffled null +#' distribution of feature expression values, #' without considering the spatial position of cells. #' More details will follow soon. #' @export -exprCellCellcom = function(gobject, - feat_type = NULL, - spat_unit = NULL, - cluster_column = 'cell_types', - random_iter = 1000, - feat_set_1, - feat_set_2, - log2FC_addendum = 0.1, - detailed = FALSE, - adjust_method = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none"), - adjust_target = c('feats', 'cells'), - set_seed = TRUE, - seed_number = 1234, - verbose = TRUE) { - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # data.table variables - lig_nr = lig_cell_type = rec_nr = rec_cell_type = rand_expr = av_diff = log2fc = LR_expr = pvalue = NULL - LR_cell_comb = p.adj = LR_comb = PI = sd_diff = z_score = NULL - - # get parameters - adjust_method = match.arg(adjust_method, choices = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none")) - adjust_target = match.arg(adjust_target, choices = c('feats', 'cells')) - - # get information about number of cells - cell_metadata = pDataDT(gobject, - feat_type = feat_type, - spat_unit = spat_unit) - nr_cell_types = cell_metadata[,.N, by = c(cluster_column)] - nr_cells = nr_cell_types$N - names(nr_cells) = nr_cell_types$cluster_column - - - comScore = average_feat_feat_expression_in_groups( - gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2 - ) - - comScore[, lig_nr := nr_cells[lig_cell_type]] - comScore[, rec_nr := nr_cells[rec_cell_type]] - - # prepare for randomized scores - total_av = rep(0, nrow(comScore)) - - if(detailed == FALSE) { - total_sum = rep(0, nrow(comScore)) - } else { - total_sum = matrix(nrow = nrow(comScore), ncol = random_iter) - } - - total_bool = rep(0, nrow(comScore)) - - - ## parallel option ## - # not yet available - - - for(sim in 1:random_iter) { - - if(verbose == TRUE) cat('simulation ', sim, '\n') - - - # create temporary giotto - tempGiotto = subsetGiotto(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit) - - # randomize annoation - cell_types = cell_metadata[[cluster_column]] - if(set_seed == TRUE) { - seed_number = seed_number+sim - set.seed(seed = seed_number) - } - random_cell_types = sample(x = cell_types, size = length(cell_types)) - tempGiotto = addCellMetadata( - gobject = tempGiotto, - feat_type = feat_type, - spat_unit = spat_unit, - new_metadata = random_cell_types, - by_column = FALSE # on purpose since values are random +exprCellCellcom <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + cluster_column = "cell_types", + random_iter = 1000, + feat_set_1, + feat_set_2, + log2FC_addendum = 0.1, + detailed = FALSE, + adjust_method = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + ), + adjust_target = c("feats", "cells"), + set_seed = TRUE, + seed_number = 1234, + verbose = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit ) - - # get random communication scores - randomScore = average_feat_feat_expression_in_groups( - gobject = tempGiotto, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = 'random_cell_types', - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2 + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - # average random score - total_av = total_av + randomScore[['LR_expr']] + # data.table variables + lig_nr <- lig_cell_type <- rec_nr <- rec_cell_type <- rand_expr <- + av_diff <- log2fc <- LR_expr <- pvalue <- NULL + LR_cell_comb <- p.adj <- LR_comb <- PI <- sd_diff <- z_score <- NULL - # difference between observed and random - difference = comScore[['LR_expr']] - randomScore[['LR_expr']] + # get parameters + adjust_method <- match.arg(adjust_method, choices = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + )) + adjust_target <- match.arg(adjust_target, choices = c("feats", "cells")) - # calculate total difference - if(detailed == FALSE) { - total_sum = total_sum+difference - } else { - total_sum[,sim] = difference - } - - # calculate p-values - difference[difference > 0] = 1 - difference[difference < 0] = -1 - total_bool = total_bool + difference + # get information about number of cells + cell_metadata <- pDataDT(gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) + nr_cell_types <- cell_metadata[, .N, by = c(cluster_column)] + nr_cells <- nr_cell_types$N + names(nr_cells) <- nr_cell_types$cluster_column + + + comScore <- average_feat_feat_expression_in_groups( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2 + ) - } + comScore[, lig_nr := nr_cells[lig_cell_type]] + comScore[, rec_nr := nr_cells[rec_cell_type]] - comScore[, rand_expr := total_av/random_iter] + # prepare for randomized scores + total_av <- rep(0, nrow(comScore)) - if(detailed == TRUE) { - av_difference_scores = rowMeans_flex(total_sum) - sd_difference_scores = apply(total_sum, MARGIN = 1, FUN = stats::sd) + if (detailed == FALSE) { + total_sum <- rep(0, nrow(comScore)) + } else { + total_sum <- matrix(nrow = nrow(comScore), ncol = random_iter) + } - comScore[, av_diff := av_difference_scores] - comScore[, sd_diff := sd_difference_scores] - comScore[, z_score := (LR_expr - rand_expr)/sd_diff] + total_bool <- rep(0, nrow(comScore)) + + + ## parallel option ## + # not yet available + + + for (sim in 1:random_iter) { + if (verbose == TRUE) cat("simulation ", sim) + + + # create temporary giotto + tempGiotto <- subsetGiotto( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) + + # randomize annoation + cell_types <- cell_metadata[[cluster_column]] + if (set_seed == TRUE) { + seed_number <- seed_number + sim + set.seed(seed = seed_number) + } + random_cell_types <- sample(x = cell_types, size = length(cell_types)) + tempGiotto <- addCellMetadata( + gobject = tempGiotto, + feat_type = feat_type, + spat_unit = spat_unit, + new_metadata = random_cell_types, + by_column = FALSE # on purpose since values are random + ) + + # get random communication scores + randomScore <- average_feat_feat_expression_in_groups( + gobject = tempGiotto, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = "random_cell_types", + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2 + ) + + # average random score + total_av <- total_av + randomScore[["LR_expr"]] + + # difference between observed and random + difference <- comScore[["LR_expr"]] - randomScore[["LR_expr"]] + + # calculate total difference + if (detailed == FALSE) { + total_sum <- total_sum + difference + } else { + total_sum[, sim] <- difference + } + + # calculate p-values + difference[difference > 0] <- 1 + difference[difference < 0] <- -1 + total_bool <- total_bool + difference + } - } else { - comScore[, av_diff := total_sum/random_iter] - } + comScore[, rand_expr := total_av / random_iter] - comScore[, log2fc := log2((LR_expr+log2FC_addendum)/(rand_expr+log2FC_addendum))] - comScore[, pvalue := total_bool/random_iter] - comScore[, pvalue := ifelse(pvalue > 0, 1-pvalue, 1+pvalue)] - comScore[, LR_cell_comb := paste0(lig_cell_type,'--',rec_cell_type)] + if (detailed == TRUE) { + av_difference_scores <- rowMeans_flex(total_sum) + sd_difference_scores <- apply(total_sum, MARGIN = 1, FUN = stats::sd) - if(adjust_target == 'feats') { - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_cell_comb)] - } else if(adjust_target == 'cells'){ - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_comb)] - } + comScore[, av_diff := av_difference_scores] + comScore[, sd_diff := sd_difference_scores] + comScore[, z_score := (LR_expr - rand_expr) / sd_diff] + } else { + comScore[, av_diff := total_sum / random_iter] + } + comScore[, log2fc := log2((LR_expr + log2FC_addendum) / ( + rand_expr + log2FC_addendum))] + comScore[, pvalue := total_bool / random_iter] + comScore[, pvalue := ifelse(pvalue > 0, 1 - pvalue, 1 + pvalue)] + comScore[, LR_cell_comb := paste0(lig_cell_type, "--", rec_cell_type)] + + if (adjust_target == "feats") { + comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), + by = .(LR_cell_comb)] + } else if (adjust_target == "cells") { + comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), + by = .(LR_comb)] + } - # get minimum adjusted p.value that is not zero - all_p.adj = comScore[['p.adj']] - lowest_p.adj = min(all_p.adj[all_p.adj != 0]) - comScore[, PI := ifelse(p.adj == 0, log2fc*(-log10(lowest_p.adj)), log2fc*(-log10(p.adj)))] - #comScore[, PI := log2fc*(1-p.adj)] - data.table::setorder(comScore, LR_comb, -LR_expr) + # get minimum adjusted p.value that is not zero + all_p.adj <- comScore[["p.adj"]] + lowest_p.adj <- min(all_p.adj[all_p.adj != 0]) + comScore[, PI := ifelse(p.adj == 0, log2fc * (-log10(lowest_p.adj)), + log2fc * (-log10(p.adj)))] - return(comScore) + data.table::setorder(comScore, LR_comb, -LR_expr) + return(comScore) } @@ -2279,48 +2478,54 @@ exprCellCellcom = function(gobject, #' @param gobject giotto object to use #' @param feat_type feature type #' @param cluster_column cluster column with cell type information -#' @param needed_cell_types vector of cell type names for which a random id will be found +#' @param needed_cell_types vector of cell type names for which a random id +#' will be found #' @param set_seed set a seed for reproducibility #' @param seed_number seed number #' @return list of randomly sampled cell ids with same cell type composition #' @keywords internal -.create_cell_type_random_cell_IDs = function(gobject, - feat_type = NULL, - spat_unit = NULL, - cluster_column = 'cell_types', - needed_cell_types, - set_seed = FALSE, - seed_number = 1234) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # subset metadata to choose from - full_metadata = pDataDT(gobject, - feat_type = feat_type, - spat_unit = spat_unit) - possible_metadata = full_metadata[get(cluster_column) %in% unique(needed_cell_types)] - - sample_ids = list() - - uniq_types = unique(needed_cell_types) - - for(i in seq_along(uniq_types)) { - - uniq_type = uniq_types[i] - length_random = length(needed_cell_types[needed_cell_types == uniq_type]) - if(set_seed == TRUE) { - set.seed(seed = seed_number) - } - sub_sample_ids = possible_metadata[get(cluster_column) == uniq_type][sample(x = 1:.N, size = length_random)][['cell_ID']] - sample_ids[[i]] = sub_sample_ids +.create_cell_type_random_cell_IDs <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + cluster_column = "cell_types", + needed_cell_types, + set_seed = FALSE, + seed_number = 1234) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - } - return(unlist(sample_ids)) + # subset metadata to choose from + full_metadata <- pDataDT(gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) + possible_metadata <- full_metadata[get(cluster_column) %in% unique( + needed_cell_types)] + + sample_ids <- list() + + uniq_types <- unique(needed_cell_types) + + for (i in seq_along(uniq_types)) { + uniq_type <- uniq_types[i] + length_random <- length(needed_cell_types[ + needed_cell_types == uniq_type]) + if (set_seed == TRUE) { + set.seed(seed = seed_number) + } + sub_sample_ids <- possible_metadata[get(cluster_column) == uniq_type][ + sample(x = 1:.N, size = length_random)][["cell_ID"]] + sample_ids[[i]] <- sub_sample_ids + } + return(unlist(sample_ids)) } @@ -2328,11 +2533,13 @@ exprCellCellcom = function(gobject, #' @title specificCellCellcommunicationScores #' @name specificCellCellcommunicationScores -#' @description Specific Cell-Cell communication scores based on spatial expression of interacting cells +#' @description Specific Cell-Cell communication scores based on spatial +#' expression of interacting cells #' @param gobject giotto object to use #' @param feat_type feature type #' @param spat_unit spatial unit -#' @param spatial_network_name spatial network to use for identifying interacting cells +#' @param spatial_network_name spatial network to use for identifying +#' interacting cells #' @param cluster_column cluster column with cell type information #' @param random_iter number of iterations #' @param cell_type_1 first cell type @@ -2342,17 +2549,22 @@ exprCellCellcom = function(gobject, #' @param gene_set_1 deprecated, use feat_set_1 #' @param gene_set_2 deprecated, use feat_set_2 #' @param log2FC_addendum addendum to add when calculating log2FC -#' @param min_observations minimum number of interactions needed to be considered -#' @param detailed provide more detailed information (random variance and z-score) +#' @param min_observations minimum number of interactions needed to be +#' considered +#' @param detailed provide more detailed information +#' (random variance and z-score) #' @param adjust_method which method to adjust p-values #' @param adjust_target adjust multiple hypotheses at the cell or feature level #' @param set_seed set a seed for reproducibility #' @param seed_number seed number #' @param verbose verbose -#' @return Cell-Cell communication scores for feature pairs based on spatial interaction -#' @details Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -#' are expressed at higher levels than expected based on a reshuffled null distribution -#' of feature expression values in cells that are spatially in proximity to eachother. +#' @return Cell-Cell communication scores for feature pairs based on spatial +#' interaction +#' @details Statistical framework to identify if pairs of features +#' (such as ligand-receptor combinations) +#' are expressed at higher levels than expected based on a reshuffled null +#' distribution of feature expression values in cells that are spatially in +#' proximity to eachother. #' \itemize{ #' \item{LR_comb:}{Pair of ligand and receptor} #' \item{lig_cell_type:}{ cell type to assess expression level of ligand } @@ -2375,225 +2587,252 @@ exprCellCellcom = function(gobject, #' \item{PI:}{ significanc score: log2fc * -log10(p.adj) } #' } #' @export -specificCellCellcommunicationScores = function(gobject, - feat_type = NULL, - spat_unit = NULL, - spatial_network_name = 'Delaunay_network', - cluster_column = 'cell_types', - random_iter = 100, - cell_type_1 = 'astrocyte', - cell_type_2 = 'endothelial', - feat_set_1, - feat_set_2, - gene_set_1 = NULL, - gene_set_2 = NULL, - log2FC_addendum = 0.1, - min_observations = 2, - detailed = FALSE, - adjust_method = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none"), - adjust_target = c('feats', 'cells'), - set_seed = FALSE, - seed_number = 1234, - verbose = T) { - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## deprecated arguments - if(!is.null(gene_set_1)) { - feat_set_1 = gene_set_1 - warning('gene_set_1 is deprecated, use feat_set_1 in the future \n') - } - if(!is.null(gene_set_2)) { - feat_set_2 = gene_set_2 - warning('gene_set_2 is deprecated, use feat_set_2 in the future \n') - } - - - # data.table variables - from_to = cell_ID = lig_cell_type = rec_cell_type = lig_nr = rec_nr = rand_expr = NULL - av_diff = log2fc = LR_expr = pvalue = LR_cell_comb = p.adj = LR_comb = PI = NULL - sd_diff = z_score = NULL - - # get parameters - adjust_method = match.arg(adjust_method, choices = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none")) - adjust_target = match.arg(adjust_target, choices = c('feats', 'cells')) - - # metadata - cell_metadata = pDataDT(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit) - - # get annotated spatial network - annot_network = annotateSpatialNetwork(gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - cell_direction_1 = paste0(cell_type_1,'-',cell_type_2) - cell_direction_2 = paste0(cell_type_2,'-',cell_type_1) - - subset_annot_network = annot_network[from_to %in% c(cell_direction_1, cell_direction_2)] - - # make sure that there are sufficient observations - if(nrow(subset_annot_network) <= min_observations) { - - return(NULL) - - } else { - - - # subset giotto object to only interacting cells - subset_ids = unique(c(subset_annot_network$to, subset_annot_network$from)) - subsetGiotto = subsetGiotto(gobject = gobject, - cell_ids = subset_ids, - feat_type = feat_type, - spat_unit = spat_unit) - - # get information about number of cells - temp_meta = pDataDT(subsetGiotto, - feat_type = feat_type, - spat_unit = spat_unit) - nr_cell_types = temp_meta[cell_ID %in% subset_ids][,.N, by = c(cluster_column)] - nr_cells = nr_cell_types$N - names(nr_cells) = nr_cell_types$cell_types - - # get average communication scores - comScore = average_feat_feat_expression_in_groups(gobject = subsetGiotto, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2) - comScore = comScore[(lig_cell_type == cell_type_1 & rec_cell_type == cell_type_2) | - (lig_cell_type == cell_type_2 & rec_cell_type == cell_type_1)] - - comScore[, lig_nr := nr_cells[lig_cell_type]] - comScore[, rec_nr := nr_cells[rec_cell_type]] - - # prepare for randomized scores - total_av = rep(0, nrow(comScore)) +specificCellCellcommunicationScores <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + spatial_network_name = "Delaunay_network", + cluster_column = "cell_types", + random_iter = 100, + cell_type_1 = "astrocyte", + cell_type_2 = "endothelial", + feat_set_1, + feat_set_2, + gene_set_1 = NULL, + gene_set_2 = NULL, + log2FC_addendum = 0.1, + min_observations = 2, + detailed = FALSE, + adjust_method = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + ), + adjust_target = c("feats", "cells"), + set_seed = FALSE, + seed_number = 1234, + verbose = TRUE) { + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - if(detailed == FALSE) { - total_sum = rep(0, nrow(comScore)) - } else { - total_sum = matrix(nrow = nrow(comScore), ncol = random_iter) + ## deprecated arguments + if (!is.null(gene_set_1)) { + feat_set_1 <- gene_set_1 + warning("gene_set_1 is deprecated, use feat_set_1 in the future") } - - total_bool = rep(0, nrow(comScore)) - - # identify which cell types you need - subset_metadata = cell_metadata[cell_ID %in% subset_ids] - needed_cell_types = subset_metadata[[cluster_column]] - - - - ## simulations ## - for(sim in 1:random_iter) { - - if(verbose == TRUE) cat('simulation ', sim, '\n') - - # get random ids and subset - if(set_seed == TRUE) { - seed_number = seed_number+sim - set.seed(seed = seed_number) - } - random_ids = .create_cell_type_random_cell_IDs(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - needed_cell_types = needed_cell_types, - set_seed = set_seed, - seed_number = seed_number) - tempGiotto = subsetGiotto(gobject = gobject, - cell_ids = random_ids, - feat_type = feat_type, - spat_unit = spat_unit) - - # get random communication scores - randomScore = average_feat_feat_expression_in_groups(gobject = tempGiotto, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2) - randomScore = randomScore[(lig_cell_type == cell_type_1 & rec_cell_type == cell_type_2) | - (lig_cell_type == cell_type_2 & rec_cell_type == cell_type_1)] - - - - - # average random score - total_av = total_av + randomScore[['LR_expr']] - - # difference between observed and random - difference = comScore[['LR_expr']] - randomScore[['LR_expr']] - - # calculate total difference - if(detailed == FALSE) { - total_sum = total_sum+difference - } else { - total_sum[,sim] = difference - } - - # calculate p-values - difference[difference > 0] = 1 - difference[difference < 0] = -1 - total_bool = total_bool + difference - + if (!is.null(gene_set_2)) { + feat_set_2 <- gene_set_2 + warning("gene_set_2 is deprecated, use feat_set_2 in the future") } - comScore[, rand_expr := total_av/random_iter] - - if(detailed == TRUE) { - av_difference_scores = rowMeans_flex(total_sum) - sd_difference_scores = apply(total_sum, MARGIN = 1, FUN = stats::sd) - comScore[, av_diff := av_difference_scores] - comScore[, sd_diff := sd_difference_scores] - comScore[, z_score := (LR_expr - rand_expr)/sd_diff] + # data.table variables + from_to <- cell_ID <- lig_cell_type <- rec_cell_type <- lig_nr <- + rec_nr <- rand_expr <- NULL + av_diff <- log2fc <- LR_expr <- pvalue <- LR_cell_comb <- p.adj <- + LR_comb <- PI <- NULL + sd_diff <- z_score <- NULL + + # get parameters + adjust_method <- match.arg(adjust_method, choices = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + )) + adjust_target <- match.arg(adjust_target, choices = c("feats", "cells")) + + # metadata + cell_metadata <- pDataDT( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) - } else { - comScore[, av_diff := total_sum/random_iter] - } + # get annotated spatial network + annot_network <- annotateSpatialNetwork(gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) + cell_direction_1 <- paste0(cell_type_1, "-", cell_type_2) + cell_direction_2 <- paste0(cell_type_2, "-", cell_type_1) - comScore[, log2fc := log2((LR_expr+log2FC_addendum)/(rand_expr+log2FC_addendum))] - comScore[, pvalue := total_bool/random_iter] - comScore[, pvalue := ifelse(pvalue > 0, 1-pvalue, 1+pvalue)] - comScore[, LR_cell_comb := paste0(lig_cell_type,'--',rec_cell_type)] + subset_annot_network <- annot_network[from_to %in% c( + cell_direction_1, cell_direction_2)] - if(adjust_target == 'feats') { - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_cell_comb)] - } else if(adjust_target == 'cells'){ - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_comb)] + # make sure that there are sufficient observations + if (nrow(subset_annot_network) <= min_observations) { + return(NULL) + } else { + # subset giotto object to only interacting cells + subset_ids <- unique(c( + subset_annot_network$to, subset_annot_network$from)) + subsetGiotto <- subsetGiotto( + gobject = gobject, + cell_ids = subset_ids, + feat_type = feat_type, + spat_unit = spat_unit + ) + + # get information about number of cells + temp_meta <- pDataDT(subsetGiotto, + feat_type = feat_type, + spat_unit = spat_unit + ) + nr_cell_types <- temp_meta[cell_ID %in% subset_ids][ + , .N, by = c(cluster_column)] + nr_cells <- nr_cell_types$N + names(nr_cells) <- nr_cell_types$cell_types + + # get average communication scores + comScore <- average_feat_feat_expression_in_groups( + gobject = subsetGiotto, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2 + ) + comScore <- comScore[(lig_cell_type == cell_type_1 & + rec_cell_type == cell_type_2) | + (lig_cell_type == cell_type_2 & rec_cell_type == cell_type_1)] + + comScore[, lig_nr := nr_cells[lig_cell_type]] + comScore[, rec_nr := nr_cells[rec_cell_type]] + + # prepare for randomized scores + total_av <- rep(0, nrow(comScore)) + + if (detailed == FALSE) { + total_sum <- rep(0, nrow(comScore)) + } else { + total_sum <- matrix(nrow = nrow(comScore), ncol = random_iter) + } + + total_bool <- rep(0, nrow(comScore)) + + # identify which cell types you need + subset_metadata <- cell_metadata[cell_ID %in% subset_ids] + needed_cell_types <- subset_metadata[[cluster_column]] + + + + ## simulations ## + for (sim in 1:random_iter) { + if (verbose == TRUE) cat("simulation ", sim) + + # get random ids and subset + if (set_seed == TRUE) { + seed_number <- seed_number + sim + set.seed(seed = seed_number) + } + random_ids <- .create_cell_type_random_cell_IDs( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + needed_cell_types = needed_cell_types, + set_seed = set_seed, + seed_number = seed_number + ) + tempGiotto <- subsetGiotto( + gobject = gobject, + cell_ids = random_ids, + feat_type = feat_type, + spat_unit = spat_unit + ) + + # get random communication scores + randomScore <- average_feat_feat_expression_in_groups( + gobject = tempGiotto, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2 + ) + randomScore <- randomScore[(lig_cell_type == cell_type_1 & + rec_cell_type == cell_type_2) | + (lig_cell_type == cell_type_2 & rec_cell_type == cell_type_1)] + + + + + # average random score + total_av <- total_av + randomScore[["LR_expr"]] + + # difference between observed and random + difference <- comScore[["LR_expr"]] - randomScore[["LR_expr"]] + + # calculate total difference + if (detailed == FALSE) { + total_sum <- total_sum + difference + } else { + total_sum[, sim] <- difference + } + + # calculate p-values + difference[difference > 0] <- 1 + difference[difference < 0] <- -1 + total_bool <- total_bool + difference + } + + comScore[, rand_expr := total_av / random_iter] + + if (detailed == TRUE) { + av_difference_scores <- rowMeans_flex(total_sum) + sd_difference_scores <- apply( + total_sum, MARGIN = 1, FUN = stats::sd) + + comScore[, av_diff := av_difference_scores] + comScore[, sd_diff := sd_difference_scores] + comScore[, z_score := (LR_expr - rand_expr) / sd_diff] + } else { + comScore[, av_diff := total_sum / random_iter] + } + + + comScore[, log2fc := log2((LR_expr + log2FC_addendum) / ( + rand_expr + log2FC_addendum))] + comScore[, pvalue := total_bool / random_iter] + comScore[, pvalue := ifelse(pvalue > 0, 1 - pvalue, 1 + pvalue)] + comScore[, LR_cell_comb := paste0(lig_cell_type, "--", rec_cell_type)] + + if (adjust_target == "feats") { + comScore[, p.adj := stats::p.adjust( + pvalue, method = adjust_method), by = .(LR_cell_comb)] + } else if (adjust_target == "cells") { + comScore[, p.adj := stats::p.adjust( + pvalue, method = adjust_method), by = .(LR_comb)] + } + + # get minimum adjusted p.value that is not zero + all_p.adj <- comScore[["p.adj"]] + lowest_p.adj <- min(all_p.adj[all_p.adj != 0]) + comScore[, PI := ifelse(p.adj == 0, log2fc * ( + -log10(lowest_p.adj)), log2fc * (-log10(p.adj)))] + + return(comScore) } - - # get minimum adjusted p.value that is not zero - all_p.adj = comScore[['p.adj']] - lowest_p.adj = min(all_p.adj[all_p.adj != 0]) - comScore[, PI := ifelse(p.adj == 0, log2fc*(-log10(lowest_p.adj)), log2fc*(-log10(p.adj)))] - - return(comScore) - - } } #' @title spatCellCellcom #' @name spatCellCellcom -#' @description Spatial Cell-Cell communication scores based on spatial expression of interacting cells +#' @description Spatial Cell-Cell communication scores based on spatial +#' expression of interacting cells #' @param gobject giotto object to use #' @param feat_type feature type #' @param spat_unit spatial unit -#' @param spatial_network_name spatial network to use for identifying interacting cells +#' @param spatial_network_name spatial network to use for identifying +#' interacting cells #' @param cluster_column cluster column with cell type information #' @param random_iter number of iterations #' @param feat_set_1 first specific feature set from feature pairs @@ -2601,8 +2840,10 @@ specificCellCellcommunicationScores = function(gobject, #' @param gene_set_1 deprecated, use feat_set_1 #' @param gene_set_2 deprecated, use feat_set_2 #' @param log2FC_addendum addendum to add when calculating log2FC -#' @param min_observations minimum number of interactions needed to be considered -#' @param detailed provide more detailed information (random variance and z-score) +#' @param min_observations minimum number of interactions needed to be +#' considered +#' @param detailed provide more detailed information +#' (random variance and z-score) #' @param adjust_method which method to adjust p-values #' @param adjust_target adjust multiple hypotheses at the cell or feature level #' @param do_parallel run calculations in parallel with mclapply @@ -2610,10 +2851,13 @@ specificCellCellcommunicationScores = function(gobject, #' @param set_seed set a seed for reproducibility #' @param seed_number seed number #' @param verbose verbose -#' @return Cell-Cell communication scores for feature pairs based on spatial interaction -#' @details Statistical framework to identify if pairs of genes (such as ligand-receptor combinations) -#' are expressed at higher levels than expected based on a reshuffled null distribution -#' of feature expression values in cells that are spatially in proximity to eachother.. +#' @return Cell-Cell communication scores for feature pairs based on spatial +#' interaction +#' @details Statistical framework to identify if pairs of genes +#' (such as ligand-receptor combinations) +#' are expressed at higher levels than expected based on a reshuffled null +#' distribution of feature expression values in cells that are spatially in +#' proximity to each other. #' \itemize{ #' \item{LR_comb:}{Pair of ligand and receptor} #' \item{lig_cell_type:}{ cell type to assess expression level of ligand } @@ -2636,149 +2880,158 @@ specificCellCellcommunicationScores = function(gobject, #' \item{PI:}{ significanc score: log2fc * -log10(p.adj) } #' } #' @export -spatCellCellcom = function(gobject, - feat_type = NULL, - spat_unit = NULL, - spatial_network_name = 'Delaunay_network', - cluster_column = 'cell_types', - random_iter = 1000, - feat_set_1, - feat_set_2, - gene_set_1 = NULL, - gene_set_2 = NULL, - log2FC_addendum = 0.1, - min_observations = 2, - detailed = FALSE, - adjust_method = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none"), - adjust_target = c('feats', 'cells'), - do_parallel = TRUE, - cores = NA, - set_seed = TRUE, - seed_number = 1234, - verbose = c('a little', 'a lot', 'none')) { - - - verbose = match.arg(verbose, choices = c('a little', 'a lot', 'none')) - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - ## check if spatial network exists ## - spat_networks = list_spatial_networks_names(gobject, - spat_unit = spat_unit) - - if(!spatial_network_name %in% spat_networks) { - stop(spatial_network_name, ' is not an existing spatial network \n', - 'use showNetworks() to see the available networks \n', - 'or create a new spatial network with createSpatialNetwork() \n') - } - - ## deprecated arguments - if(!is.null(gene_set_1)) { - feat_set_1 = gene_set_1 - warning('gene_set_1 is deprecated, use feat_set_1 in the future \n') - } - if(!is.null(gene_set_2)) { - feat_set_2 = gene_set_2 - warning('gene_set_2 is deprecated, use feat_set_2 in the future \n') - } - - - cell_metadata = pDataDT(gobject, - feat_type = feat_type, - spat_unit = spat_unit) - - ## get all combinations between cell types - all_uniq_values = unique(cell_metadata[[cluster_column]]) - same_DT = data.table::data.table(V1 = all_uniq_values, V2 = all_uniq_values) - combn_DT = data.table::as.data.table(t(combn(all_uniq_values, m = 2))) - combn_DT = rbind(same_DT, combn_DT) - - ## parallel option ## - if(do_parallel == TRUE) { - - savelist = lapply_flex(X = 1:nrow(combn_DT), future.seed=TRUE, cores = cores, fun = function(row) { - - cell_type_1 = combn_DT[row][['V1']] - cell_type_2 = combn_DT[row][['V2']] - - specific_scores = specificCellCellcommunicationScores(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - random_iter = random_iter, - cell_type_1 = cell_type_1, - cell_type_2 = cell_type_2, - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2, - spatial_network_name = spatial_network_name, - log2FC_addendum = log2FC_addendum, - min_observations = min_observations, - detailed = detailed, - adjust_method = adjust_method, - adjust_target = adjust_target, - set_seed = set_seed, - seed_number = seed_number) +spatCellCellcom <- function(gobject, + feat_type = NULL, + spat_unit = NULL, + spatial_network_name = "Delaunay_network", + cluster_column = "cell_types", + random_iter = 1000, + feat_set_1, + feat_set_2, + gene_set_1 = NULL, + gene_set_2 = NULL, + log2FC_addendum = 0.1, + min_observations = 2, + detailed = FALSE, + adjust_method = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + ), + adjust_target = c("feats", "cells"), + do_parallel = TRUE, + cores = NA, + set_seed = TRUE, + seed_number = 1234, + verbose = c("a little", "a lot", "none")) { + verbose <- match.arg(verbose, choices = c("a little", "a lot", "none")) + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) - }) + ## check if spatial network exists ## + spat_networks <- list_spatial_networks_names(gobject, + spat_unit = spat_unit + ) + if (!spatial_network_name %in% spat_networks) { + stop( + spatial_network_name, " is not an existing spatial network \n", + "use showNetworks() to see the available networks \n", + "or create a new spatial network with createSpatialNetwork()" + ) + } - } else { - - ## for loop over all combinations ## - savelist = list() - countdown = nrow(combn_DT) - - for(row in 1:nrow(combn_DT)) { - - cell_type_1 = combn_DT[row][['V1']] - cell_type_2 = combn_DT[row][['V2']] - - if(verbose == 'a little' | verbose == 'a lot') cat('\n\n PROCESS nr ', countdown,': ', cell_type_1, ' and ', cell_type_2, '\n\n') - - if(verbose %in% c('a little', 'none')) { - specific_verbose = F - } else { - specific_verbose = T - } - - specific_scores = specificCellCellcommunicationScores(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - cluster_column = cluster_column, - random_iter = random_iter, - cell_type_1 = cell_type_1, - cell_type_2 = cell_type_2, - feat_set_1 = feat_set_1, - feat_set_2 = feat_set_2, - spatial_network_name = spatial_network_name, - log2FC_addendum = log2FC_addendum, - min_observations = min_observations, - detailed = detailed, - adjust_method = adjust_method, - adjust_target = adjust_target, - set_seed = set_seed, - seed_number = seed_number, - verbose = specific_verbose) - savelist[[row]] = specific_scores - countdown = countdown - 1 + ## deprecated arguments + if (!is.null(gene_set_1)) { + feat_set_1 <- gene_set_1 + warning("gene_set_1 is deprecated, use feat_set_1 in the future") } + if (!is.null(gene_set_2)) { + feat_set_2 <- gene_set_2 + warning("gene_set_2 is deprecated, use feat_set_2 in the future") + } + - } + cell_metadata <- pDataDT(gobject, + feat_type = feat_type, + spat_unit = spat_unit + ) + + ## get all combinations between cell types + all_uniq_values <- unique(cell_metadata[[cluster_column]]) + same_DT <- data.table::data.table( + V1 = all_uniq_values, V2 = all_uniq_values) + combn_DT <- data.table::as.data.table(t(combn(all_uniq_values, m = 2))) + combn_DT <- rbind(same_DT, combn_DT) + + ## parallel option ## + if (do_parallel == TRUE) { + savelist <- lapply_flex( + X = 1:nrow(combn_DT), future.seed = TRUE, + cores = cores, fun = function(row) { + cell_type_1 <- combn_DT[row][["V1"]] + cell_type_2 <- combn_DT[row][["V2"]] + + specific_scores <- specificCellCellcommunicationScores( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + random_iter = random_iter, + cell_type_1 = cell_type_1, + cell_type_2 = cell_type_2, + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2, + spatial_network_name = spatial_network_name, + log2FC_addendum = log2FC_addendum, + min_observations = min_observations, + detailed = detailed, + adjust_method = adjust_method, + adjust_target = adjust_target, + set_seed = set_seed, + seed_number = seed_number + ) + }) + } else { + ## for loop over all combinations ## + savelist <- list() + countdown <- nrow(combn_DT) + + for (row in 1:nrow(combn_DT)) { + cell_type_1 <- combn_DT[row][["V1"]] + cell_type_2 <- combn_DT[row][["V2"]] + + if (verbose == "a little" | verbose == "a lot") + cat("PROCESS nr ", countdown, ": ", cell_type_1, " and ", + cell_type_2) + + if (verbose %in% c("a little", "none")) { + specific_verbose <- FALSE + } else { + specific_verbose <- TRUE + } + + specific_scores <- specificCellCellcommunicationScores( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + cluster_column = cluster_column, + random_iter = random_iter, + cell_type_1 = cell_type_1, + cell_type_2 = cell_type_2, + feat_set_1 = feat_set_1, + feat_set_2 = feat_set_2, + spatial_network_name = spatial_network_name, + log2FC_addendum = log2FC_addendum, + min_observations = min_observations, + detailed = detailed, + adjust_method = adjust_method, + adjust_target = adjust_target, + set_seed = set_seed, + seed_number = seed_number, + verbose = specific_verbose + ) + savelist[[row]] <- specific_scores + countdown <- countdown - 1 + } + } - finalDT = do.call('rbind', savelist) + finalDT <- do.call("rbind", savelist) - # data.table variables - LR_comb = LR_expr = NULL + # data.table variables + LR_comb <- LR_expr <- NULL - data.table::setorder(finalDT, LR_comb, -LR_expr) + data.table::setorder(finalDT, LR_comb, -LR_expr) - return(finalDT) + return(finalDT) } @@ -2786,7 +3039,8 @@ spatCellCellcom = function(gobject, #' @title combCCcom #' @name combCCcom -#' @description Combine spatial and expression based cell-cell communication data.tables +#' @description Combine spatial and expression based cell-cell communication +#' data.tables #' @param spatialCC spatial cell-cell communication scores #' @param exprCC expression cell-cell communication scores #' @param min_lig_nr minimum number of ligand cells @@ -2794,57 +3048,67 @@ spatCellCellcom = function(gobject, #' @param min_padj_value minimum adjusted p-value #' @param min_log2fc minimum log2 fold-change #' @param min_av_diff minimum average expression difference -#' @param detailed detailed option used with \code{\link{spatCellCellcom}} (default = FALSE) +#' @param detailed detailed option used with \code{\link{spatCellCellcom}} +#' (default = FALSE) #' @return combined data.table with spatial and expression communication data #' @export -combCCcom = function(spatialCC, - exprCC, - min_lig_nr = 3, - min_rec_nr = 3, - min_padj_value = 1, - min_log2fc = 0, - min_av_diff = 0, - detailed = FALSE) { - - - # data.table variables - lig_nr = rec_nr = p.adj = log2fc = av_diff = NULL - - spatialCC = spatialCC[lig_nr >= min_lig_nr & rec_nr >= min_rec_nr & - p.adj <= min_padj_value & abs(log2fc) >= min_log2fc & abs(av_diff) >= min_av_diff] - - - if(detailed == TRUE) { - old_detailed = c('sd_diff', 'z_score') - new_detailed = c('sd_diff_spat', 'z_score_spat') - } else { - old_detailed = NULL - new_detailed = NULL - } - +combCCcom <- function(spatialCC, + exprCC, + min_lig_nr = 3, + min_rec_nr = 3, + min_padj_value = 1, + min_log2fc = 0, + min_av_diff = 0, + detailed = FALSE) { + # data.table variables + lig_nr <- rec_nr <- p.adj <- log2fc <- av_diff <- NULL + + spatialCC <- spatialCC[lig_nr >= min_lig_nr & rec_nr >= min_rec_nr & + p.adj <= min_padj_value & abs(log2fc) >= min_log2fc & + abs(av_diff) >= min_av_diff] + + + if (detailed == TRUE) { + old_detailed <- c("sd_diff", "z_score") + new_detailed <- c("sd_diff_spat", "z_score_spat") + } else { + old_detailed <- NULL + new_detailed <- NULL + } - data.table::setnames(x = spatialCC, - old = c('lig_expr', 'rec_expr', 'LR_expr', 'lig_nr', 'rec_nr', - 'rand_expr', 'av_diff', old_detailed, 'log2fc', 'pvalue', 'p.adj', 'PI'), - new = c('lig_expr_spat', 'rec_expr_spat', 'LR_expr_spat', 'lig_nr_spat', 'rec_nr_spat', - 'rand_expr_spat', 'av_diff_spat', new_detailed, 'log2fc_spat', 'pvalue_spat', 'p.adj_spat', 'PI_spat')) - merge_DT = data.table::merge.data.table(spatialCC, exprCC, by = c('LR_comb', 'LR_cell_comb', - 'lig_cell_type', 'rec_cell_type', - 'ligand', 'receptor')) + data.table::setnames( + x = spatialCC, + old = c( + "lig_expr", "rec_expr", "LR_expr", "lig_nr", "rec_nr", + "rand_expr", "av_diff", old_detailed, + "log2fc", "pvalue", "p.adj", "PI" + ), + new = c( + "lig_expr_spat", "rec_expr_spat", "LR_expr_spat", "lig_nr_spat", + "rec_nr_spat", + "rand_expr_spat", "av_diff_spat", new_detailed, "log2fc_spat", + "pvalue_spat", "p.adj_spat", "PI_spat" + ) + ) - # data.table variables - LR_expr_rnk = LR_expr = LR_comb = LR_spat_rnk = LR_expr_spat = exprPI_rnk = PI = spatPI_rnk = PI_spat = NULL + merge_DT <- data.table::merge.data.table(spatialCC, exprCC, by = c( + "LR_comb", "LR_cell_comb", + "lig_cell_type", "rec_cell_type", + "ligand", "receptor" + )) - # rank for expression levels - merge_DT[, LR_expr_rnk := rank(-LR_expr), by = LR_comb] - merge_DT[, LR_spat_rnk := rank(-LR_expr_spat), by = LR_comb] + # data.table variables + LR_expr_rnk <- LR_expr <- LR_comb <- LR_spat_rnk <- LR_expr_spat <- + exprPI_rnk <- PI <- spatPI_rnk <- PI_spat <- NULL - # rank for differential activity levels - merge_DT[, exprPI_rnk := rank(-PI), by = LR_comb] - merge_DT[, spatPI_rnk := rank(-PI_spat), by = LR_comb] + # rank for expression levels + merge_DT[, LR_expr_rnk := rank(-LR_expr), by = LR_comb] + merge_DT[, LR_spat_rnk := rank(-LR_expr_spat), by = LR_comb] - return(merge_DT) + # rank for differential activity levels + merge_DT[, exprPI_rnk := rank(-PI), by = LR_comb] + merge_DT[, spatPI_rnk := rank(-PI_spat), by = LR_comb] + return(merge_DT) } - diff --git a/R/spatial_interaction_spot.R b/R/spatial_interaction_spot.R index 5a6c3478a..7c907c749 100644 --- a/R/spatial_interaction_spot.R +++ b/R/spatial_interaction_spot.R @@ -8,110 +8,122 @@ #' @name cell_proximity_spots #' @description Compute cell-cell interactions observed value for internal and #' external spots -#' @param dwls_values data.table of cell type enrichment in each spot and multiply -#' by cell number in each spot +#' @param dwls_values data.table of cell type enrichment in each spot and +#' multiply by cell number in each spot #' @return List of cell proximity observed value in data.table format. Columns: #' unified_int, type_int, V1, external, internal. NULL -#' @describeIn cell_proximity_spots Compute cell-cell interactions observed value inner each spot +#' @describeIn cell_proximity_spots Compute cell-cell interactions observed +#' value inner each spot #' @param cell_IDs cell_IDs #' @keywords internal -.cell_proximity_spots_internal = function(cell_IDs, - dwls_values){ - - # data.table variables - value = unified_int = Var1 = Var2 = internal = NULL - - proximity_dt = data.table::data.table() - # calculate proximity for each spot - for (cell_i in seq_along(cell_IDs)){ - cell_ID = cell_IDs[cell_i] - # dwls value for one spot and remove 0 cell type - dwls_spot = dwls_values[cell_ID,] - dwls_spot = dwls_spot[dwls_spot > 0] - - # calculate proximity of same cell type (A==B) - same_ct = data.table::data.table() - if (length(dwls_spot) >= 1){ - same_ct = (dwls_spot-1) * dwls_spot / 2 - # transfer format - unified_int_same = names(same_ct) - unified_int_same = paste0(unified_int_same,'--',unified_int_same) - same_ct = data.table::data.table('unified_int' = unified_int_same,'internal' = same_ct) +.cell_proximity_spots_internal <- function(cell_IDs, + dwls_values) { + # data.table variables + value <- unified_int <- Var1 <- Var2 <- internal <- NULL + + proximity_dt <- data.table::data.table() + # calculate proximity for each spot + for (cell_i in seq_along(cell_IDs)) { + cell_ID <- cell_IDs[cell_i] + # dwls value for one spot and remove 0 cell type + dwls_spot <- dwls_values[cell_ID, ] + dwls_spot <- dwls_spot[dwls_spot > 0] + + # calculate proximity of same cell type (A==B) + same_ct <- data.table::data.table() + if (length(dwls_spot) >= 1) { + same_ct <- (dwls_spot - 1) * dwls_spot / 2 + # transfer format + unified_int_same <- names(same_ct) + unified_int_same <- paste0(unified_int_same, "--", unified_int_same) + same_ct <- data.table::data.table( + "unified_int" = unified_int_same, "internal" = same_ct) + } + + # calculate proximity of different cell type (A==B) + diff_ct <- data.table::data.table() + if (length(dwls_spot) >= 2) { + diff_ct <- dwls_spot %o% dwls_spot + # modifiy duplicate value + diag(diff_ct) <- NA + diff_ct[lower.tri(diff_ct)] <- NA + # transfer format to data.table + diff_ct <- data.table::as.data.table(reshape2::melt(diff_ct)) + diff_ct <- diff_ct[value != "NA"] + diff_ct[, c("Var1", "Var2") := lapply( + .SD, as.character), .SDcols = c("Var1", "Var2")] + diff_ct[, unified_int := ifelse( + Var1 < Var2, paste0(Var1, "--", Var2), + paste0(Var2, "--", Var1))] + diff_ct <- diff_ct[, c("unified_int", "value")] + data.table::setnames( + diff_ct, old = c("value"), new = c("internal")) + } + + # merge spot proximity to proximity data.table + proximity_dt <- rbind(proximity_dt, same_ct, diff_ct) } - # calculate proximity of different cell type (A==B) - diff_ct = data.table::data.table() - if (length(dwls_spot) >= 2){ - diff_ct = dwls_spot %o% dwls_spot - #modifiy duplicate value - diag(diff_ct) = NA - diff_ct[lower.tri(diff_ct)] = NA - # transfer format to data.table - diff_ct = data.table::as.data.table(reshape2::melt(diff_ct)) - diff_ct = diff_ct[value != 'NA' ] - diff_ct[, c('Var1', 'Var2') := lapply(.SD, as.character),.SDcols = c('Var1', 'Var2')] - diff_ct[, unified_int := ifelse(Var1 < Var2, paste0(Var1,'--',Var2), paste0(Var2,'--',Var1))] - diff_ct = diff_ct[, c('unified_int', 'value')] - data.table::setnames(diff_ct, old = c('value'), new = c('internal')) - } - - # merge spot proximity to proximity data.table - proximity_dt = rbind(proximity_dt, same_ct, diff_ct) - } + proximity_dt <- proximity_dt[internal > 0] + proximity_dt[, internal := sum(internal), by = c("unified_int")] + proximity_dt <- unique(proximity_dt) - proximity_dt = proximity_dt[internal > 0] - proximity_dt[, internal := sum(internal), by = c('unified_int')] - proximity_dt = unique(proximity_dt) - - return(proximity_dt) + return(proximity_dt) } -#' @describeIn cell_proximity_spots Compute cell-cell interactions observed value for interacted spots +#' @describeIn cell_proximity_spots Compute cell-cell interactions observed +#' value for interacted spots #' @param pairs data.table of paired spots. Format: cell_ID1, cell_ID2, N #' @keywords internal .cell_proximity_spots_external <- function( - pairs, - dwls_values -) { - - cell_IDs = unique(c(pairs$from, pairs$to)) - pairs = pairs[, .N, by = c('from','to')] - # add internal pairs to make full matrix - pairs_spots = data.table::data.table(from = cell_IDs, to = cell_IDs, N = 0) - pairs_balance = data.table::data.table(from = pairs$to, to = pairs$from, N = pairs$N) - pairs_for_mat = rbind(pairs_spots, pairs, pairs_balance) - pairs_for_mat = pairs_for_mat[, .N, by = c('from','to')] - - # make square matrix of interaction between spots - pairs_mat = reshape2::acast(pairs_for_mat, from ~ to, value.var = 'N' ,fill = 0) - pairs_mat = pairs_mat[cell_IDs,cell_IDs] - - #calculate cell-type/cell-type interactions - dwls_sub = dwls_values[cell_IDs,] - proximity_dt = data.table::data.table() - cts = colnames(dwls_sub) - cts = mixedsort(cts) - for (i in seq_along(cts)){ - ct1 = cts[i] - dwls_ct1 = dwls_sub[, ct1] - - for (j in i:length(cts)){ - ct2 = cts[j] - dwls_ct2 = dwls_sub[, ct2] - if (i == j ){f = 0.5}else{f=1} - proximity_2cts = dwls_ct1 %o% dwls_ct2 * pairs_mat * f - proximity_2cts = sum(proximity_2cts) - proximity_2cts = data.table::data.table(unified_int = paste0(ct1,'--',ct2), - external = proximity_2cts) - proximity_dt = rbind(proximity_dt, proximity_2cts) + pairs, + dwls_values) { + cell_IDs <- unique(c(pairs$from, pairs$to)) + pairs <- pairs[, .N, by = c("from", "to")] + # add internal pairs to make full matrix + pairs_spots <- data.table::data.table(from = cell_IDs, to = cell_IDs, N = 0) + pairs_balance <- data.table::data.table( + from = pairs$to, to = pairs$from, N = pairs$N) + pairs_for_mat <- rbind(pairs_spots, pairs, pairs_balance) + pairs_for_mat <- pairs_for_mat[, .N, by = c("from", "to")] + + # make square matrix of interaction between spots + pairs_mat <- reshape2::acast( + pairs_for_mat, from ~ to, value.var = "N", fill = 0) + pairs_mat <- pairs_mat[cell_IDs, cell_IDs] + + # calculate cell-type/cell-type interactions + dwls_sub <- dwls_values[cell_IDs, ] + proximity_dt <- data.table::data.table() + cts <- colnames(dwls_sub) + cts <- mixedsort(cts) + for (i in seq_along(cts)) { + ct1 <- cts[i] + dwls_ct1 <- dwls_sub[, ct1] + + for (j in i:length(cts)) { + ct2 <- cts[j] + dwls_ct2 <- dwls_sub[, ct2] + if (i == j) { + f <- 0.5 + } else { + f <- 1 + } + proximity_2cts <- dwls_ct1 %o% dwls_ct2 * pairs_mat * f + proximity_2cts <- sum(proximity_2cts) + proximity_2cts <- data.table::data.table( + unified_int = paste0(ct1, "--", ct2), + external = proximity_2cts + ) + proximity_dt <- rbind(proximity_dt, proximity_2cts) + } } - } - return(proximity_dt) + return(proximity_dt) } @@ -121,43 +133,51 @@ NULL #' @param pairs_external data.table of paired spots. Format: cell_ID1, cell_ID2, #' N. Passes to `.cell_proximity_spots_external` `pairs` param #' @keywords internal -.cell_proximity_spots = function(cell_IDs, - pairs_external, - dwls_values){ - - # data.table variables - V1 = internal = external = s1 = s2 = unified_int = type_int = NULL - - # compute cell-type/cell-type interactions in each spot (internal) - if (length(cell_IDs) > 0){ - proximity_in = .cell_proximity_spots_internal(cell_IDs = cell_IDs, - dwls_values = dwls_values) - } - - # compute cell-type/cell-type interactions between spots (external) - # get paired spots barcodes - proximity_ex = .cell_proximity_spots_external(pairs = pairs_external, - dwls_values = dwls_values) - - if (length(cell_IDs) > 0) { - proximity_dt = merge(proximity_ex, proximity_in, by= 'unified_int', all=TRUE) - }else{ - proximity_dt = proximity_ex[, 'internal' := 0] - } - proximity_dt[is.na(proximity_dt)] = 0 - proximity_dt[, V1 := internal + external] - - proximity_dt[, s1 := strsplit(as.character(unified_int), split = '--')[[1]][1], by = 1:nrow(proximity_dt)] - proximity_dt[, s2 := strsplit(as.character(unified_int), split = '--')[[1]][2], by = 1:nrow(proximity_dt)] - proximity_dt[, type_int := ifelse(s1 == s2, 'homo', 'hetero')] - proximity_dt = proximity_dt[, c('unified_int', 'type_int', 'V1', 'external', 'internal')] - return(proximity_dt) +.cell_proximity_spots <- function(cell_IDs, + pairs_external, + dwls_values) { + # data.table variables + V1 <- internal <- external <- s1 <- s2 <- unified_int <- type_int <- NULL + + # compute cell-type/cell-type interactions in each spot (internal) + if (length(cell_IDs) > 0) { + proximity_in <- .cell_proximity_spots_internal( + cell_IDs = cell_IDs, + dwls_values = dwls_values + ) + } + + # compute cell-type/cell-type interactions between spots (external) + # get paired spots barcodes + proximity_ex <- .cell_proximity_spots_external( + pairs = pairs_external, + dwls_values = dwls_values + ) + + if (length(cell_IDs) > 0) { + proximity_dt <- merge( + proximity_ex, proximity_in, by = "unified_int", all = TRUE) + } else { + proximity_dt <- proximity_ex[, "internal" := 0] + } + proximity_dt[is.na(proximity_dt)] <- 0 + proximity_dt[, V1 := internal + external] + + proximity_dt[, s1 := strsplit(as.character( + unified_int), split = "--")[[1]][1], by = 1:nrow(proximity_dt)] + proximity_dt[, s2 := strsplit(as.character( + unified_int), split = "--")[[1]][2], by = 1:nrow(proximity_dt)] + proximity_dt[, type_int := ifelse(s1 == s2, "homo", "hetero")] + proximity_dt <- proximity_dt[ + , c("unified_int", "type_int", "V1", "external", "internal")] + return(proximity_dt) } #' @title cellProximityEnrichmentSpots #' @name cellProximityEnrichmentSpots -#' @description Compute cell-cell interaction enrichment for spots (observed vs expected) +#' @description Compute cell-cell interaction enrichment for spots +#' (observed vs expected) #' #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. 'cell') @@ -165,191 +185,228 @@ NULL #' @param spatial_network_name name of spatial network to use #' @param cluster_column name of column to use for clusters #' @param cells_in_spot cell number in each spot -#' @param number_of_simulations number of simulations to create expected observations -#' @param adjust_method method to adjust p.values (e.g. "none", "fdr", "bonferroni","BH","holm", "hochberg", "hommel","BY") +#' @param number_of_simulations number of simulations to create expected +#' observations +#' @param adjust_method method to adjust p.values +#' (e.g. "none", "fdr", "bonferroni","BH","holm", "hochberg", "hommel","BY") #' @param set_seed use of seed. Default = TRUE #' @param seed_number seed number to use. Default = 1234 #' @param verbose be verbose #' -#' @return List of cell Proximity scores (CPscores) in data.table format. The first -#' data.table (raw_sim_table) shows the raw observations of both the original and -#' simulated networks. The second data.table (enrichm_res) shows the enrichment results. -#' @details Spatial proximity enrichment or depletion between pairs of cell types -#' is calculated by calculating the observed over the expected frequency -#' of cell-cell proximity interactions. The expected frequency is the average frequency -#' calculated from a number of spatial network simulations. Each individual simulation is -#' obtained by reshuffling the cell type labels of each node (spot) -#' in the spatial network. +#' @return List of cell Proximity scores (CPscores) in data.table format. +#' The first +#' data.table (raw_sim_table) shows the raw observations of both the original +#' and simulated networks. The second data.table (enrichm_res) shows the +#' enrichment results. +#' @details Spatial proximity enrichment or depletion between pairs of cell +#' types is calculated by calculating the observed over the expected frequency +#' of cell-cell proximity interactions. The expected frequency is the average +#' frequency calculated from a number of spatial network simulations. Each +#' individual simulation is obtained by reshuffling the cell type labels of +#' each node (spot) in the spatial network. #' @export cellProximityEnrichmentSpots <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network_name = 'spatial_network', - cluster_column = 'cell_ID', - cells_in_spot = 1, - number_of_simulations = 100, - adjust_method = c("none", "fdr", "bonferroni","BH", - "holm", "hochberg", "hommel", - "BY"), - set_seed = TRUE, - seed_number = 1234, - verbose = FALSE) { - - # p.adj test - sel_adjust_method = match.arg(adjust_method, choices = c("none", "fdr", "bonferroni","BH", - "holm", "hochberg", "hommel", - "BY")) - - spatial_network_annot = annotateSpatialNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - # data.table variables - orig = from = to = unified_int = unified_cells = type_int = N = V1 = original = enrichm = simulations = NULL - - spatial_network_annot = dt_sort_combine_two_columns(spatial_network_annot, 'to', 'from', 'unified_cells') - spatial_network_annot = spatial_network_annot[!duplicated(unified_cells)] - - # exact spatial_enrichment matrix - dwls_values = get_spatial_enrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - enrichm_name = 'DWLS', - output = 'data.table') - data.table::setDF(dwls_values) - rownames_dwls = dwls_values[,'cell_ID'] - dwls_values = as.matrix(dwls_values[,-1]) - rownames(dwls_values) = rownames_dwls - dwls_values_adjust = dwls_values * cells_in_spot - - # compute cell-type/cell-type interactions - if(verbose) print("1/5 Computing cell-type/cell-type interactions") - - orig_pairs_external = spatial_network_annot[, .N, by = c('from', 'to')] - table_orig_results = .cell_proximity_spots(cell_IDs = pDataDT(gobject)$cell_ID, - pairs_external = orig_pairs_external, - dwls_values = dwls_values_adjust) - table_orig_results[, orig := 'original'] - table_orig_results[, round := 'original'] - - # make simulated network - if(verbose) print("2/5 Make simulated network") - - sample_dt = make_simulated_network(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column, - number_of_simulations = number_of_simulations, - set_seed = set_seed, - seed_number = seed_number) - - # method for get simulation cell-type/cell-type interaction for each round - data.table::setnames(sample_dt, old = c('s1', 's2'), new = c('from', 'to')) - table_sim_results = NULL - for(sim in 1:number_of_simulations) { - r = paste0('sim',sim) - sim_pairs = sample_dt[round == r, c("from","to")] - - sim_cell_IDs = unique(sim_pairs[from == to, from]) - sim_pairs_ex = sim_pairs[from != to, ] - sim_pairs_ex[, N :=1] - - sim_dt_round = .cell_proximity_spots(cell_IDs = sim_cell_IDs, - pairs_external = sim_pairs_ex, - dwls_values = dwls_values_adjust) - - sim_dt_round[, orig := 'simulations'] - sim_dt_round[, round := r] - table_sim_results = rbind(table_sim_results, sim_dt_round) - } - - - table_results = rbind(table_orig_results, table_sim_results) - - # add missing combinations from original or simulations - # probably not needed anymore - all_sim_ints = as.character(unique(table_results[orig == 'simulations']$unified_int)) - all_orig_ints = as.character(unique(table_results[orig == 'original']$unified_int)) - missing_in_orig = all_sim_ints[!all_sim_ints %in% all_orig_ints] - missing_in_sim = all_orig_ints[!all_orig_ints %in% all_sim_ints] - create_missing_for_orig = table_results[unified_int %in% missing_in_orig] - create_missing_for_orig = unique(create_missing_for_orig[, c('orig', 'V1') := list('original', 0)]) - create_missing_for_sim = table_results[unified_int %in% missing_in_sim] - create_missing_for_sim = unique(create_missing_for_sim[, c('orig', 'V1') := list('simulations', 0)]) - - table_results <- do.call('rbind', list(table_results, create_missing_for_orig, create_missing_for_sim)) - - ## p-values - if(verbose) print("3/5 Calculating p-values") - - combo_list = rep(NA, length = length(unique(table_results$unified_int))) - p_high = rep(NA, length = length(unique(table_results$unified_int))) - p_low = rep(NA, length = length(unique(table_results$unified_int))) - - for(int_combo in seq_along(unique(table_results$unified_int))) { - - this_combo = as.character(unique(table_results$unified_int)[int_combo]) - - sub = table_results[unified_int == this_combo] - - orig_value = sub[orig == 'original']$V1 - sim_values = sub[orig == 'simulations']$V1 - - length_simulations = length(sim_values) - if(length_simulations != number_of_simulations) { - additional_length_needed = number_of_simulations-length_simulations - sim_values = c(sim_values, rep(0, additional_length_needed)) - #length_simulations = c(length_simulations, rep(0, additional_length_needed)) - } + spat_unit = NULL, + feat_type = NULL, + spatial_network_name = "spatial_network", + cluster_column = "cell_ID", + cells_in_spot = 1, + number_of_simulations = 100, + adjust_method = c( + "none", "fdr", "bonferroni", "BH", + "holm", "hochberg", "hommel", + "BY" + ), + set_seed = TRUE, + seed_number = 1234, + verbose = FALSE) { + # p.adj test + sel_adjust_method <- match.arg(adjust_method, choices = c( + "none", "fdr", "bonferroni", "BH", + "holm", "hochberg", "hommel", + "BY" + )) + + spatial_network_annot <- annotateSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) - p_orig_higher = 1 - (sum((orig_value+1) > (sim_values+1))/number_of_simulations) - p_orig_lower = 1 - (sum((orig_value+1) < (sim_values+1))/number_of_simulations) + # data.table variables + orig <- from <- to <- unified_int <- unified_cells <- type_int <- N <- + V1 <- original <- enrichm <- simulations <- NULL + + spatial_network_annot <- dt_sort_combine_two_columns( + spatial_network_annot, "to", "from", "unified_cells") + spatial_network_annot <- spatial_network_annot[!duplicated(unified_cells)] + + # exact spatial_enrichment matrix + dwls_values <- getSpatialEnrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = "DWLS", + output = "data.table" + ) + data.table::setDF(dwls_values) + rownames_dwls <- dwls_values[, "cell_ID"] + dwls_values <- as.matrix(dwls_values[, -1]) + rownames(dwls_values) <- rownames_dwls + dwls_values_adjust <- dwls_values * cells_in_spot + + # compute cell-type/cell-type interactions + if (verbose) message("1/5 Computing cell-type/cell-type interactions") + + orig_pairs_external <- spatial_network_annot[, .N, by = c("from", "to")] + table_orig_results <- .cell_proximity_spots( + cell_IDs = pDataDT(gobject)$cell_ID, + pairs_external = orig_pairs_external, + dwls_values = dwls_values_adjust + ) + table_orig_results[, orig := "original"] + table_orig_results[, round := "original"] + + # make simulated network + if (verbose) message("2/5 Make simulated network") + + sample_dt <- make_simulated_network( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column, + number_of_simulations = number_of_simulations, + set_seed = set_seed, + seed_number = seed_number + ) + + # method for get simulation cell-type/cell-type interaction for each round + data.table::setnames(sample_dt, old = c("s1", "s2"), new = c("from", "to")) + table_sim_results <- NULL + for (sim in 1:number_of_simulations) { + r <- paste0("sim", sim) + sim_pairs <- sample_dt[round == r, c("from", "to")] + + sim_cell_IDs <- unique(sim_pairs[from == to, from]) + sim_pairs_ex <- sim_pairs[from != to, ] + sim_pairs_ex[, N := 1] + + sim_dt_round <- .cell_proximity_spots( + cell_IDs = sim_cell_IDs, + pairs_external = sim_pairs_ex, + dwls_values = dwls_values_adjust + ) + + sim_dt_round[, orig := "simulations"] + sim_dt_round[, round := r] + table_sim_results <- rbind(table_sim_results, sim_dt_round) + } - combo_list[[int_combo]] = this_combo - p_high[[int_combo]] = p_orig_higher - p_low[[int_combo]] = p_orig_lower - } - res_pvalue_DT = data.table::data.table(unified_int = as.vector(combo_list), p_higher_orig = p_high, p_lower_orig = p_low) + table_results <- rbind(table_orig_results, table_sim_results) + + # add missing combinations from original or simulations + # probably not needed anymore + all_sim_ints <- as.character(unique(table_results[ + orig == "simulations"]$unified_int)) + all_orig_ints <- as.character(unique(table_results[ + orig == "original"]$unified_int)) + missing_in_orig <- all_sim_ints[!all_sim_ints %in% all_orig_ints] + missing_in_sim <- all_orig_ints[!all_orig_ints %in% all_sim_ints] + create_missing_for_orig <- table_results[unified_int %in% missing_in_orig] + create_missing_for_orig <- unique(create_missing_for_orig[ + , c("orig", "V1") := list("original", 0)]) + create_missing_for_sim <- table_results[unified_int %in% missing_in_sim] + create_missing_for_sim <- unique(create_missing_for_sim[ + , c("orig", "V1") := list("simulations", 0)]) + + table_results <- do.call( + "rbind", + list(table_results, create_missing_for_orig, create_missing_for_sim)) + + ## p-values + if (verbose) message("3/5 Calculating p-values") + + combo_list <- rep(NA, length = length(unique(table_results$unified_int))) + p_high <- rep(NA, length = length(unique(table_results$unified_int))) + p_low <- rep(NA, length = length(unique(table_results$unified_int))) + + for (int_combo in seq_along(unique(table_results$unified_int))) { + this_combo <- as.character(unique(table_results$unified_int)[int_combo]) + + sub <- table_results[unified_int == this_combo] + + orig_value <- sub[orig == "original"]$V1 + sim_values <- sub[orig == "simulations"]$V1 + + length_simulations <- length(sim_values) + if (length_simulations != number_of_simulations) { + additional_length_needed <- number_of_simulations - + length_simulations + sim_values <- c(sim_values, rep(0, additional_length_needed)) + } + + p_orig_higher <- 1 - (sum((orig_value + 1) > (sim_values + 1)) / + number_of_simulations) + p_orig_lower <- 1 - (sum((orig_value + 1) < (sim_values + 1)) / + number_of_simulations) + + combo_list[[int_combo]] <- this_combo + p_high[[int_combo]] <- p_orig_higher + p_low[[int_combo]] <- p_orig_lower + } + res_pvalue_DT <- data.table::data.table( + unified_int = as.vector(combo_list), + p_higher_orig = p_high, + p_lower_orig = p_low) - # depletion or enrichment in barplot format - if(verbose) print("4/5 Depletion or enrichment in barplot format") + # depletion or enrichment in barplot format + if (verbose) message("4/5 Depletion or enrichment in barplot format") - table_mean_results <- table_results[, .(mean(V1)), by = c('orig', 'unified_int', 'type_int')] - table_mean_results_dc <- data.table::dcast.data.table(data = table_mean_results, formula = type_int+unified_int~orig, value.var = 'V1') - table_mean_results_dc[, original := ifelse(is.na(original), 0, original)] - table_mean_results_dc[, enrichm := log2((original+1)/(simulations+1))] + table_mean_results <- table_results[ + , .(mean(V1)), by = c("orig", "unified_int", "type_int")] + table_mean_results_dc <- data.table::dcast.data.table( + data = table_mean_results, + formula = type_int + unified_int ~ orig, value.var = "V1") + table_mean_results_dc[, original := ifelse(is.na(original), 0, original)] + table_mean_results_dc[, enrichm := log2((original + 1) / (simulations + 1))] - table_mean_results_dc <- merge(table_mean_results_dc, res_pvalue_DT, by = 'unified_int') - data.table::setorder(table_mean_results_dc, enrichm) - table_mean_results_dc[, unified_int := factor(unified_int, unified_int)] + table_mean_results_dc <- merge( + table_mean_results_dc, res_pvalue_DT, by = "unified_int") + data.table::setorder(table_mean_results_dc, enrichm) + table_mean_results_dc[, unified_int := factor(unified_int, unified_int)] - # adjust p-values for mht + # adjust p-values for mht - if(verbose) print("5/5 Calculating adjust p-values for mht") + if (verbose) message("5/5 Calculating adjust p-values for mht") - # data.table variables - p.adj_higher = p.adj_lower = p_lower_orig = p_higher_orig = PI_value = int_ranking = NULL + # data.table variables + p.adj_higher <- p.adj_lower <- p_lower_orig <- p_higher_orig <- + PI_value <- int_ranking <- NULL - table_mean_results_dc[, p.adj_higher := stats::p.adjust(p_higher_orig, method = sel_adjust_method)] - table_mean_results_dc[, p.adj_lower := stats::p.adjust(p_lower_orig, method = sel_adjust_method)] + table_mean_results_dc[, p.adj_higher := stats::p.adjust( + p_higher_orig, method = sel_adjust_method)] + table_mean_results_dc[, p.adj_lower := stats::p.adjust( + p_lower_orig, method = sel_adjust_method)] - table_mean_results_dc[, PI_value := ifelse(p.adj_higher <= p.adj_lower, - -log10(p.adj_higher+(1/number_of_simulations))*enrichm, - -log10(p.adj_lower+(1/number_of_simulations))*enrichm)] - data.table::setorder(table_mean_results_dc, PI_value) + table_mean_results_dc[, PI_value := ifelse(p.adj_higher <= p.adj_lower, + -log10(p.adj_higher + (1 / number_of_simulations)) * enrichm, + -log10(p.adj_lower + (1 / number_of_simulations)) * enrichm + )] + data.table::setorder(table_mean_results_dc, PI_value) - # order - table_mean_results_dc <- table_mean_results_dc[order(-PI_value)] - table_mean_results_dc[, int_ranking := 1:.N] + # order + table_mean_results_dc <- table_mean_results_dc[order(-PI_value)] + table_mean_results_dc[, int_ranking := 1:.N] - return(list(raw_sim_table = table_results, enrichm_res = table_mean_results_dc)) + return(list( + raw_sim_table = table_results, enrichm_res = table_mean_results_dc)) } @@ -361,8 +418,8 @@ cellProximityEnrichmentSpots <- function(gobject, #' @title featExpDWLS #' @name featExpDWLS -#' @description Compute predicted feature expression value by spatialDWSL results and -#' average feature expression for cell type +#' @description Compute predicted feature expression value by spatialDWSL +#' results and average feature expression for cell type #' #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. 'cell') @@ -371,86 +428,95 @@ cellProximityEnrichmentSpots <- function(gobject, #' #' @return matrix #' @export -featExpDWLS = function(gobject, - spat_unit = NULL, - feat_type = NULL, - ave_celltype_exp){ - - # exact spatial_enrichment matrix - dwls_values = get_spatial_enrichment(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - enrichm_name = 'DWLS', - output = 'data.table') - - # 1. check if cell_type_vector and matrix are compatible - if(ncol(ave_celltype_exp) != ncol(dwls_values) - 1) { - stop('ncol(ave_celltype_exp) needs to be the same as ncol(dwls_values) - 1') - } - - cell_types = colnames(ave_celltype_exp) - data.table::setcolorder(dwls_values,c('cell_ID',cell_types)) - - # 2. for each spot - # calculate dwls predicted expression for features - expMatrixDWLS = matrix(data = NA, - nrow = nrow(ave_celltype_exp), - ncol = nrow(dwls_values)) - - average_exp = as.matrix(ave_celltype_exp) - for(spot_i in 1:nrow(dwls_values)){ - spot = dwls_values[spot_i,1] - spot_dwls = dwls_values[spot_i, -1] - data.table::setDF(spot_dwls) - spot_dwls = as.vector(t(spot_dwls)[,1]) - - spot_exp = average_exp %*% spot_dwls - - expMatrixDWLS[, spot_i] = spot_exp - } - rownames(expMatrixDWLS) = rownames(ave_celltype_exp) - colnames(expMatrixDWLS) = dwls_values$cell_ID - - return(expMatrixDWLS) +featExpDWLS <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + ave_celltype_exp) { + # exact spatial_enrichment matrix + dwls_values <- getSpatialEnrichment(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = "DWLS", + output = "data.table" + ) + + # 1. check if cell_type_vector and matrix are compatible + if (ncol(ave_celltype_exp) != ncol(dwls_values) - 1) { + stop("ncol(ave_celltype_exp) needs to be the same as + ncol(dwls_values) - 1") + } + + cell_types <- colnames(ave_celltype_exp) + data.table::setcolorder(dwls_values, c("cell_ID", cell_types)) + + # 2. for each spot + # calculate dwls predicted expression for features + expMatrixDWLS <- matrix( + data = NA, + nrow = nrow(ave_celltype_exp), + ncol = nrow(dwls_values) + ) + + average_exp <- as.matrix(ave_celltype_exp) + for (spot_i in 1:nrow(dwls_values)) { + spot <- dwls_values[spot_i, 1] + spot_dwls <- dwls_values[spot_i, -1] + data.table::setDF(spot_dwls) + spot_dwls <- as.vector(t(spot_dwls)[, 1]) + + spot_exp <- average_exp %*% spot_dwls + + expMatrixDWLS[, spot_i] <- spot_exp + } + rownames(expMatrixDWLS) <- rownames(ave_celltype_exp) + colnames(expMatrixDWLS) <- dwls_values$cell_ID + + return(expMatrixDWLS) } #' @title Calculate feature expression residual #' @name .cal_expr_residual -#' @description Calculate feature expression residual (observed_exp - DWLS_predicted) +#' @description Calculate feature expression residual +#' (observed_exp - DWLS_predicted) #' #' @param gobject giotto object #' @param spat_unit spatial unit (e.g. 'cell') #' @param feat_type feature type (e.g. 'rna') -#' @param expression_values expression values to use (e.g. 'normalized', 'scaled', 'custom') +#' @param expression_values expression values to use +#' (e.g. 'normalized', 'scaled', 'custom') #' @param ave_celltype_exp average expression matrix in cell types #' #' @keywords internal .cal_expr_residual <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - ave_celltype_exp) { - - # expression data - values = match.arg(expression_values, choices = c('normalized', 'scaled', 'custom')) - # expr_observed = get_expression_values(gobject = gobject, - # values = expression_values, - # output = 'matrix') - expr_observed = slot(gobject@expression[[spat_unit]][[feat_type]][[values]], 'exprMat') - - # Compute predicted feature expression value - expr_predicted = featExpDWLS(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - ave_celltype_exp = ave_celltype_exp) - - # Get the difference expression matrix between observed and predicted expression - intersect_feature = intersect(rownames(expr_predicted), rownames(expr_observed)) - expr_residual = expr_observed[intersect_feature,] - expr_predicted[intersect_feature,] - expr_residual = as.matrix(expr_residual) - - return(expr_residual) + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + ave_celltype_exp) { + # expression data + values <- match.arg( + expression_values, choices = c("normalized", "scaled", "custom")) + + expr_observed <- slot(gobject@expression[[spat_unit]][[ + feat_type]][[values]], "exprMat") + + # Compute predicted feature expression value + expr_predicted <- featExpDWLS( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + ave_celltype_exp = ave_celltype_exp + ) + + # Get the difference expression matrix between observed and predicted + # expression + intersect_feature <- intersect( + rownames(expr_predicted), rownames(expr_observed)) + expr_residual <- expr_observed[intersect_feature, ] - expr_predicted[ + intersect_feature, ] + expr_residual <- as.matrix(expr_residual) + + return(expr_residual) } @@ -465,109 +531,124 @@ featExpDWLS = function(gobject, #' @param spatial_network_name name of spatial network to use #' @param cluster_column name of column to use for clusters #' -#' @return matrix that rownames are cell-cell interaction pairs and colnames are cell_IDs +#' @return matrix that rownames are cell-cell interaction pairs and colnames +#' are cell_IDs #' @export cellProximityEnrichmentEachSpot <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - spatial_network_name = 'spatial_network', - cluster_column = 'cell_ID') { - - spatial_network_annot = annotateSpatialNetwork(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - # data.table variables - V1 = V2 = from = to = int_cell_IDS = Var1 = Var2 = unified_cells = type_int = N = NULL - - spatial_network_annot = dt_sort_combine_two_columns(spatial_network_annot, 'to', 'from', 'unified_cells') - spatial_network_annot = spatial_network_annot[!duplicated(unified_cells)] - - # exact spatial_enrichment matrix - dwls_values = get_spatial_enrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - enrichm_name = 'DWLS', - output = 'data.table') - data.table::setDF(dwls_values) - rownames_dwls = dwls_values[,'cell_ID'] - dwls_values = as.matrix(dwls_values[,-1]) - rownames(dwls_values) = rownames_dwls - - # calculate cell-cell interaction with feature expression in each spot - # proximity_spots is the - # get cell-cell types pairs - cts = colnames(dwls_values) - ct_pairs = data.table::data.table(V1 = rep(cts,each = length(cts)), V2 = rep(cts,length(cts))) - ct_pairs[, unified_int := paste0(V1,'--',V2), by = 1:nrow(ct_pairs)] - unified_int = ct_pairs$unified_int - - - # get paired spots barcodes - orig_pairs = spatial_network_annot[, .N, by = c('from', 'to')] - cell_IDs = unique(c(orig_pairs$from, orig_pairs$to)) - - # make matrix that rows are cell-cell types and columns are cell_IDs - proximityMat = matrix(data = 0, - nrow = length(unified_int), - ncol = length(cell_IDs)) - - rownames(proximityMat) = unified_int - colnames(proximityMat) = cell_IDs - - # for each spot, calculate cell type proximity to it - for (cell_i in seq_along(cell_IDs)){ - cell_ID = cell_IDs[cell_i] - spot_pairs = orig_pairs[from == cell_ID | to == cell_ID] - spot_pairs[, int_cell_IDS := ifelse(from==cell_ID, to, from)] - int_num = spot_pairs$N - - dwls_target_cell = dwls_values[cell_ID,] - dwls_int_cells = dwls_values[spot_pairs$int_cell_IDS,] - - # filter 0 and kept the data type - # rowSum(dwls) = c(1,1,1,1.....) - idx1 = which(dwls_target_cell > 0) #length(idx) must > 0 - dwls_target_cell = dwls_target_cell[idx1] - - if (length(int_num) > 1){ - idx2 = which(colSums(dwls_int_cells) > 0) - dwls_int_cells = dwls_int_cells[, idx2] - - # all the interacted cells dwls have same cell type with proportion=1 - if (length(idx2) == 1){ - dwls_int_cells = matrix(dwls_int_cells, ncol = 1, - dimnames = list(spot_pairs$int_cell_IDS,names(idx2))) - } - - } else{ - # target cell only contain 1 inteacted cell - idx2 = which(dwls_int_cells > 0) - dwls_int_cells = dwls_int_cells[idx2] - dwls_int_cells = matrix(dwls_int_cells,nrow=1,byrow = TRUE, - dimnames = list(spot_pairs$int_cell_IDS,names(dwls_int_cells))) - } - + spat_unit = NULL, + feat_type = NULL, + spatial_network_name = "spatial_network", + cluster_column = "cell_ID") { + spatial_network_annot <- annotateSpatialNetwork( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) - spot_proximity = dwls_target_cell %o% (dwls_int_cells * int_num) - spot_proximity = apply(spot_proximity, 3, rowSums) - if (length(dwls_target_cell) == 1){ - # change to the right data class - spot_proximity = matrix(spot_proximity,nrow=1,byrow = TRUE, - dimnames = list(names(dwls_target_cell),names(spot_proximity))) + # data.table variables + V1 <- V2 <- from <- to <- int_cell_IDS <- Var1 <- Var2 <- + unified_cells <- type_int <- N <- NULL + + spatial_network_annot <- dt_sort_combine_two_columns( + spatial_network_annot, "to", "from", "unified_cells") + spatial_network_annot <- spatial_network_annot[!duplicated(unified_cells)] + + # exact spatial_enrichment matrix + dwls_values <- getSpatialEnrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = "DWLS", + output = "data.table" + ) + data.table::setDF(dwls_values) + rownames_dwls <- dwls_values[, "cell_ID"] + dwls_values <- as.matrix(dwls_values[, -1]) + rownames(dwls_values) <- rownames_dwls + + # calculate cell-cell interaction with feature expression in each spot + # proximity_spots is the + # get cell-cell types pairs + cts <- colnames(dwls_values) + ct_pairs <- data.table::data.table( + V1 = rep(cts, each = length(cts)), V2 = rep(cts, length(cts))) + ct_pairs[, unified_int := paste0(V1, "--", V2), by = 1:nrow(ct_pairs)] + unified_int <- ct_pairs$unified_int + + + # get paired spots barcodes + orig_pairs <- spatial_network_annot[, .N, by = c("from", "to")] + cell_IDs <- unique(c(orig_pairs$from, orig_pairs$to)) + + # make matrix that rows are cell-cell types and columns are cell_IDs + proximityMat <- matrix( + data = 0, + nrow = length(unified_int), + ncol = length(cell_IDs) + ) + rownames(proximityMat) <- unified_int + colnames(proximityMat) <- cell_IDs + + # for each spot, calculate cell type proximity to it + for (cell_i in seq_along(cell_IDs)) { + cell_ID <- cell_IDs[cell_i] + spot_pairs <- orig_pairs[from == cell_ID | to == cell_ID] + spot_pairs[, int_cell_IDS := ifelse(from == cell_ID, to, from)] + int_num <- spot_pairs$N + + dwls_target_cell <- dwls_values[cell_ID, ] + dwls_int_cells <- dwls_values[spot_pairs$int_cell_IDS, ] + + # filter 0 and kept the data type + # rowSum(dwls) = c(1,1,1,1.....) + idx1 <- which(dwls_target_cell > 0) # length(idx) must > 0 + dwls_target_cell <- dwls_target_cell[idx1] + + if (length(int_num) > 1) { + idx2 <- which(colSums(dwls_int_cells) > 0) + dwls_int_cells <- dwls_int_cells[, idx2] + + # all the interacted cells dwls have same cell type with + # proportion=1 + if (length(idx2) == 1) { + dwls_int_cells <- matrix(dwls_int_cells, + ncol = 1, + dimnames = list(spot_pairs$int_cell_IDS, names(idx2)) + ) + } + } else { + # target cell only contain 1 inteacted cell + idx2 <- which(dwls_int_cells > 0) + dwls_int_cells <- dwls_int_cells[idx2] + dwls_int_cells <- matrix(dwls_int_cells, + nrow = 1, byrow = TRUE, + dimnames = list(spot_pairs$int_cell_IDS, names(dwls_int_cells)) + ) + } + + + spot_proximity <- dwls_target_cell %o% (dwls_int_cells * int_num) + spot_proximity <- apply(spot_proximity, 3, rowSums) + if (length(dwls_target_cell) == 1) { + # change to the right data class + spot_proximity <- matrix(spot_proximity, + nrow = 1, byrow = TRUE, + dimnames = list(names(dwls_target_cell), names(spot_proximity)) + ) + } + spot_proximity <- reshape2::melt(spot_proximity) + spot_proximity <- data.table::data.table(spot_proximity) + spot_proximity[, c("Var1", "Var2") := lapply( + .SD, as.character), .SDcols = c("Var1", "Var2")] + spot_proximity[, unified_int := paste0(Var1, "--", Var2)] + + # add to proximityMat(matrix) + proximityMat[spot_proximity$unified_int, cell_i] <- spot_proximity$value } - spot_proximity = reshape2::melt(spot_proximity) - spot_proximity = data.table::data.table(spot_proximity) - spot_proximity[, c('Var1', 'Var2') := lapply(.SD, as.character),.SDcols = c('Var1', 'Var2')] - spot_proximity[, unified_int := paste0(Var1,'--',Var2)] - - # add to proximityMat(matrix) - proximityMat[spot_proximity$unified_int, cell_i] = spot_proximity$value - } - return(proximityMat) + return(proximityMat) } #' @title Calculate difference per interaction @@ -576,48 +657,52 @@ cellProximityEnrichmentEachSpot <- function(gobject, #' cell proximity score of selected cell for spots #' @keywords internal .cal_diff_per_interaction <- function(sel_int, - other_ints, - select_ind, - other_ind, - proximityMat, - expr_residual){ - - pcc_diff <- sel <- other <- NULL - - # get data - - prox_sel = proximityMat[sel_int, select_ind] - prox_sel = as.matrix(prox_sel) - expr_sel = expr_residual[, select_ind] - - prox_other = proximityMat[other_ints,other_ind] - prox_other = prox_other[rowSums(prox_other) != 0, ] - expr_other = expr_residual[, other_ind] - - # calculate pcc between expresidiual and proximity - pcc_sel = stats::cor(t(expr_sel), prox_sel) - - pcc_other = stats::cor(t(expr_other), t(prox_other)) - pcc_other = rowMeans(pcc_other) - - features = rownames(pcc_sel) - pcc_dt = data.table::data.table(features = features, - pcc_sel = as.vector(pcc_sel), - pcc_other = pcc_other[features]) + other_ints, + select_ind, + other_ind, + proximityMat, + expr_residual) { + pcc_diff <- sel <- other <- NULL + + # get data + + prox_sel <- proximityMat[sel_int, select_ind] + prox_sel <- as.matrix(prox_sel) + expr_sel <- expr_residual[, select_ind] + + prox_other <- proximityMat[other_ints, other_ind] + prox_other <- prox_other[rowSums(prox_other) != 0, ] + expr_other <- expr_residual[, other_ind] + + # calculate pcc between expresidiual and proximity + pcc_sel <- stats::cor(t(expr_sel), prox_sel) + + pcc_other <- stats::cor(t(expr_other), t(prox_other)) + pcc_other <- rowMeans(pcc_other) + + features <- rownames(pcc_sel) + pcc_dt <- data.table::data.table( + features = features, + pcc_sel = as.vector(pcc_sel), + pcc_other = pcc_other[features] + ) - pcc_dt[, pcc_diff := pcc_sel - pcc_other] + pcc_dt[, pcc_diff := pcc_sel - pcc_other] - # calculate mean exression residual - expr_sel_mean = rowMeans(expr_sel) - expr_other_mean = rowMeans(expr_other) - expr_residual_dt = data.table::data.table(features = features, - sel = expr_sel_mean[features], - other = expr_other_mean[features]) - expr_residual_dt[, diff := sel - other] + # calculate mean exression residual + expr_sel_mean <- rowMeans(expr_sel) + expr_other_mean <- rowMeans(expr_other) + expr_residual_dt <- data.table::data.table( + features = features, + sel = expr_sel_mean[features], + other = expr_other_mean[features] + ) + expr_residual_dt[, diff := sel - other] - results_dt = data.table::merge.data.table(expr_residual_dt, pcc_dt, by = 'features') + results_dt <- data.table::merge.data.table( + expr_residual_dt, pcc_dt, by = "features") - return(results_dt) + return(results_dt) } @@ -632,288 +717,309 @@ NULL #' @describeIn do_permuttest_spot Calculate original values for spots #' @keywords internal .do_permuttest_original_spot <- function(sel_int, - other_ints, - select_ind, - other_ind, - name = 'orig', - proximityMat, - expr_residual) { - - resultsDT = .cal_diff_per_interaction(sel_int = sel_int, - other_ints = other_ints, - select_ind = select_ind, - other_ind = other_ind, - proximityMat = proximityMat, - expr_residual = expr_residual) - resultsDT[, name := name] - return(resultsDT) + other_ints, + select_ind, + other_ind, + name = "orig", + proximityMat, + expr_residual) { + resultsDT <- .cal_diff_per_interaction( + sel_int = sel_int, + other_ints = other_ints, + select_ind = select_ind, + other_ind = other_ind, + proximityMat = proximityMat, + expr_residual = expr_residual + ) + resultsDT[, name := name] + return(resultsDT) } #' @describeIn do_permuttest_spot Calculate random values for spots #' @keywords internal .do_permuttest_random_spot <- function(sel_int, - other_ints, - select_ind, - other_ind, - name = 'perm_1', - proximityMat, - expr_residual, - set_seed = TRUE, - seed_number = 1234) { - - # data.table variables - features = NULL - - l_sel_int = length(sel_int) - l_other_ints = length(other_ints) - l_select_ind = length(select_ind) - l_other_ind = length(other_ind) - - all_IDs = colnames(proximityMat) - all_ints = rownames(proximityMat) - all_ints = all_ints[!rownames(proximityMat) %in% sel_int] - - if(set_seed == TRUE) { - set.seed(seed = seed_number) - } - random_sel_int = sample(all_ints, size = l_sel_int, replace = F) - random_other_ints = sample(all_ints, size = l_other_ints, replace = F) - - # keep the random selete not all the zeros - prox = proximityMat[random_sel_int,] - prox = prox[prox>0] - random_select = c(sample(all_IDs, size = l_select_ind - 1, replace = F), names(prox[1])) - random_other = c(sample(all_IDs, size = l_other_ind, replace = F), names(prox[length(prox)])) - - resultsDT = .cal_diff_per_interaction(sel_int = random_sel_int, - other_ints = random_other_ints, - select_ind = random_select, - other_ind = random_other, - proximityMat = proximityMat, - expr_residual = expr_residual) - resultsDT[, name := name] - - return(resultsDT) + other_ints, + select_ind, + other_ind, + name = "perm_1", + proximityMat, + expr_residual, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + features <- NULL + + l_sel_int <- length(sel_int) + l_other_ints <- length(other_ints) + l_select_ind <- length(select_ind) + l_other_ind <- length(other_ind) + + all_IDs <- colnames(proximityMat) + all_ints <- rownames(proximityMat) + all_ints <- all_ints[!rownames(proximityMat) %in% sel_int] + + if (set_seed == TRUE) { + set.seed(seed = seed_number) + } + random_sel_int <- sample(all_ints, size = l_sel_int, replace = FALSE) + random_other_ints <- sample(all_ints, size = l_other_ints, replace = FALSE) + + # keep the random selete not all the zeros + prox <- proximityMat[random_sel_int, ] + prox <- prox[prox > 0] + random_select <- c(sample( + all_IDs, size = l_select_ind - 1, replace = FALSE), names(prox[1])) + random_other <- c(sample( + all_IDs, size = l_other_ind, replace = FALSE), + names(prox[length(prox)])) + + resultsDT <- .cal_diff_per_interaction( + sel_int = random_sel_int, + other_ints = random_other_ints, + select_ind = random_select, + other_ind = random_other, + proximityMat = proximityMat, + expr_residual = expr_residual + ) + resultsDT[, name := name] + + return(resultsDT) } #' @describeIn do_permuttest_spot Calculate multiple random values for spots #' @keywords internal -.do_multi_permuttest_random_spot = function(sel_int, - other_ints, - select_ind, - other_ind, - proximityMat, - expr_residual, - n = 100, - cores = NA, - set_seed = TRUE, - seed_number = 1234) { - - if(set_seed == TRUE) { - seed_number_list = seed_number:(seed_number + (n-1)) - } - - result = lapply_flex(X = 1:n, cores = cores, fun = function(x) { - - seed_number = seed_number_list[x] - - perm_rand = .do_permuttest_random_spot(sel_int = sel_int, - other_ints = other_ints, - select_ind = select_ind, - other_ind = other_ind, - name = paste0('perm_', x), - proximityMat = proximityMat, - expr_residual = expr_residual, - set_seed = set_seed, - seed_number = seed_number) - - }) - - final_result = do.call('rbind', result) +.do_multi_permuttest_random_spot <- function(sel_int, + other_ints, + select_ind, + other_ind, + proximityMat, + expr_residual, + n = 100, + cores = NA, + set_seed = TRUE, + seed_number = 1234) { + if (set_seed == TRUE) { + seed_number_list <- seed_number:(seed_number + (n - 1)) + } + + result <- lapply_flex(X = 1:n, cores = cores, fun = function(x) { + seed_number <- seed_number_list[x] + + perm_rand <- .do_permuttest_random_spot( + sel_int = sel_int, + other_ints = other_ints, + select_ind = select_ind, + other_ind = other_ind, + name = paste0("perm_", x), + proximityMat = proximityMat, + expr_residual = expr_residual, + set_seed = set_seed, + seed_number = seed_number + ) + }) + final_result <- do.call("rbind", result) } -#' @describeIn do_permuttest_spot Performs permutation test on subsets of a matrix for spots +#' @describeIn do_permuttest_spot Performs permutation test on subsets of a +#' matrix for spots #' @keywords internal -.do_permuttest_spot = function(sel_int, - other_ints, - select_ind, - other_ind, - proximityMat, - expr_residual, - n_perm = 100, - adjust_method = 'fdr', - cores = 2, - set_seed = TRUE, - seed_number = 1234) { - - - - # data.table variables - log2fc_diff = log2fc = sel = other = features = p_higher = p_lower = perm_sel = NULL - perm_other = perm_log2fc = perm_diff = p.value = p.adj = pcc_sel = pcc_diff = NULL - perm_pcc_sel = perm_pcc_diff = pcc_other = NULL - - ## original data - original = .do_permuttest_original_spot(sel_int = sel_int, - other_ints = other_ints , - select_ind = select_ind, - other_ind = other_ind, - name = 'orig', - proximityMat = proximityMat, - expr_residual = expr_residual) - - ## random permutations - random_perms = .do_multi_permuttest_random_spot(sel_int = sel_int, - other_ints = other_ints, - select_ind = select_ind, - other_ind = other_ind, - proximityMat = proximityMat, - expr_residual = expr_residual, - n = n_perm, - cores = cores, - set_seed = set_seed, - seed_number = seed_number) - - ## - #random_perms[, log2fc_diff := rep(original$log2fc, n_perm) - log2fc] - random_perms[, c('perm_sel', 'perm_other', 'perm_pcc_sel', 'perm_pcc_diff') := list(mean(sel), mean(other), mean(pcc_sel), mean(pcc_diff)), by = features] - - ## get p-values - random_perms[, p_higher := sum(pcc_diff > 0), by = features] - random_perms[, p_higher := 1-(p_higher/n_perm)] - random_perms[, p_lower := sum(pcc_diff < 0), by = features] - random_perms[, p_lower := 1-(p_lower/n_perm)] - - ## combine results permutation and original - random_perms_res = unique(random_perms[,.(features, perm_sel, perm_other, perm_pcc_sel, perm_pcc_diff, p_higher, p_lower)]) - results_m = data.table::merge.data.table(random_perms_res, original[,.(features, sel, other, diff, pcc_sel, pcc_other, pcc_diff)], by = 'features') - - # select lowest p-value and perform p.adj - results_m[, p.value := ifelse(p_higher <= p_lower, p_higher, p_lower)] - results_m[, p.adj := stats::p.adjust(p.value, method = adjust_method)] - - results_m = results_m[,.(features, sel, other, pcc_sel, pcc_other, pcc_diff, p.value, p.adj, perm_sel, perm_other, perm_pcc_sel, perm_pcc_diff)] - setorder(results_m, p.adj, -pcc_diff) - - return(results_m) +.do_permuttest_spot <- function(sel_int, + other_ints, + select_ind, + other_ind, + proximityMat, + expr_residual, + n_perm = 100, + adjust_method = "fdr", + cores = 2, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + log2fc_diff <- log2fc <- sel <- other <- features <- p_higher <- + p_lower <- perm_sel <- NULL + perm_other <- perm_log2fc <- perm_diff <- p.value <- p.adj <- + pcc_sel <- pcc_diff <- NULL + perm_pcc_sel <- perm_pcc_diff <- pcc_other <- NULL + + ## original data + original <- .do_permuttest_original_spot( + sel_int = sel_int, + other_ints = other_ints, + select_ind = select_ind, + other_ind = other_ind, + name = "orig", + proximityMat = proximityMat, + expr_residual = expr_residual + ) + + ## random permutations + random_perms <- .do_multi_permuttest_random_spot( + sel_int = sel_int, + other_ints = other_ints, + select_ind = select_ind, + other_ind = other_ind, + proximityMat = proximityMat, + expr_residual = expr_residual, + n = n_perm, + cores = cores, + set_seed = set_seed, + seed_number = seed_number + ) + ## + # random_perms[, log2fc_diff := rep(original$log2fc, n_perm) - log2fc] + random_perms[, c( + "perm_sel", "perm_other", "perm_pcc_sel", "perm_pcc_diff") := list( + mean(sel), mean(other), mean(pcc_sel), mean(pcc_diff)), + by = features] + + ## get p-values + random_perms[, p_higher := sum(pcc_diff > 0), by = features] + random_perms[, p_higher := 1 - (p_higher / n_perm)] + random_perms[, p_lower := sum(pcc_diff < 0), by = features] + random_perms[, p_lower := 1 - (p_lower / n_perm)] + + ## combine results permutation and original + random_perms_res <- unique(random_perms[, .( + features, perm_sel, perm_other, perm_pcc_sel, perm_pcc_diff, + p_higher, p_lower)]) + results_m <- data.table::merge.data.table( + random_perms_res, + original[, .(features, sel, other, diff, pcc_sel, pcc_other, pcc_diff)], + by = "features") + + # select lowest p-value and perform p.adj + results_m[, p.value := ifelse(p_higher <= p_lower, p_higher, p_lower)] + results_m[, p.adj := stats::p.adjust(p.value, method = adjust_method)] + + results_m <- results_m[, .( + features, sel, other, pcc_sel, pcc_other, pcc_diff, p.value, p.adj, + perm_sel, perm_other, perm_pcc_sel, perm_pcc_diff)] + setorder(results_m, p.adj, -pcc_diff) + + return(results_m) } #' @title Cell proximity testing for spot data #' @name .do_cell_proximity_test_spot -#' @description Performs a selected differential test on subsets of a matrix for spots +#' @description Performs a selected differential test on subsets of a matrix +#' for spots #' @keywords internal -.do_cell_proximity_test_spot = function(sel_int, - other_ints, - select_ind, - other_ind, - proximityMat, - expr_residual, - diff_test, - n_perm = 100, - adjust_method = 'fdr', - cores = 2, - set_seed = TRUE, - seed_number = 1234) { - - # get parameters - diff_test = match.arg(diff_test, choices = c('permutation', 'limma', 't.test', 'wilcox')) - adjust_method = match.arg(adjust_method, choices = c("bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "fdr", "none")) - - - if(diff_test == 'permutation') { - result = .do_permuttest_spot(sel_int = sel_int, - other_ints = other_ints, - select_ind = select_ind, - other_ind = other_ind, - proximityMat = proximityMat, - expr_residual = expr_residual, - n_perm = n_perm, - adjust_method = adjust_method, - cores = cores, - set_seed = set_seed, - seed_number = seed_number) - - } - return(result) - +.do_cell_proximity_test_spot <- function(sel_int, + other_ints, + select_ind, + other_ind, + proximityMat, + expr_residual, + diff_test, + n_perm = 100, + adjust_method = "fdr", + cores = 2, + set_seed = TRUE, + seed_number = 1234) { + # get parameters + diff_test <- match.arg( + diff_test, choices = c("permutation", "limma", "t.test", "wilcox")) + adjust_method <- match.arg(adjust_method, choices = c( + "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "fdr", "none" + )) + + + if (diff_test == "permutation") { + result <- .do_permuttest_spot( + sel_int = sel_int, + other_ints = other_ints, + select_ind = select_ind, + other_ind = other_ind, + proximityMat = proximityMat, + expr_residual = expr_residual, + n_perm = n_perm, + adjust_method = adjust_method, + cores = cores, + set_seed = set_seed, + seed_number = seed_number + ) + } + return(result) } #' @title Find ICF per interaction for spot data #' @name .findICF_per_interaction_spot -#' @description Identifies features that are differentially expressed due to proximity to other cell types for spots. +#' @description Identifies features that are differentially expressed due to +#' proximity to other cell types for spots. #' @keywords internal .findICF_per_interaction_spot <- function(sel_int, - all_ints, - proximityMat, - expr_residual, - dwls_values, - dwls_cutoff = 0.001, - CCI_cell_score = 0.01, - minimum_unique_cells = 1, - minimum_unique_int_cells = 1, - diff_test = 'permutation', - n_perm = 100, - adjust_method = 'fdr', - cores = 2, - set_seed = TRUE, - seed_number = 1234){ - - # data.table variables - unified_int = NULL - - sel_ct = strsplit(sel_int, '--')[[1]][1] - int_ct = strsplit(sel_int, '--')[[1]][2] - - # filter out cells that without these two cellsltype - prox_sel = proximityMat[sel_int,] - prox_sel = prox_sel[which(prox_sel != 0)] - prox_sel = prox_sel[which(prox_sel > CCI_cell_score)] - spec_IDs = names(prox_sel) - - # find other cells contribution to cell type - dwls_all_cell = dwls_values[, sel_ct] - dwls_all_cell = dwls_all_cell[dwls_all_cell > dwls_cutoff] - all_IDs = intersect(names(dwls_all_cell), colnames(proximityMat)) - other_IDs = setdiff(all_IDs, spec_IDs) - - other_ints = all_ints['cell_type' == sel_ct]$unified_int - - other_ints = other_ints[-which(other_ints == sel_int)] - - ## do not continue if too few cells ## - if(length(spec_IDs) < minimum_unique_cells | length(other_IDs) < minimum_unique_cells) { - result = NULL - } else { - result = .do_cell_proximity_test_spot(sel_int = sel_int, - other_ints = other_ints, - select_ind = spec_IDs, - other_ind = other_IDs, - proximityMat = proximityMat, - expr_residual = expr_residual, - diff_test = diff_test, - n_perm = n_perm, - adjust_method = adjust_method, - cores = cores, - set_seed = set_seed, - seed_number = seed_number) - - result[, 'cell_type' := sel_ct] - result[, 'int_cell_type' := int_ct] - result[, 'nr_select' := length(spec_IDs)] - result[, 'int_nr_select' := length(other_IDs)] - result[, 'unif_int' := sel_int] - } - - return(result) + all_ints, + proximityMat, + expr_residual, + dwls_values, + dwls_cutoff = 0.001, + CCI_cell_score = 0.01, + minimum_unique_cells = 1, + minimum_unique_int_cells = 1, + diff_test = "permutation", + n_perm = 100, + adjust_method = "fdr", + cores = 2, + set_seed = TRUE, + seed_number = 1234) { + # data.table variables + unified_int <- NULL + + sel_ct <- strsplit(sel_int, "--")[[1]][1] + int_ct <- strsplit(sel_int, "--")[[1]][2] + + # filter out cells that without these two cellsltype + prox_sel <- proximityMat[sel_int, ] + prox_sel <- prox_sel[which(prox_sel != 0)] + prox_sel <- prox_sel[which(prox_sel > CCI_cell_score)] + spec_IDs <- names(prox_sel) + + # find other cells contribution to cell type + dwls_all_cell <- dwls_values[, sel_ct] + dwls_all_cell <- dwls_all_cell[dwls_all_cell > dwls_cutoff] + all_IDs <- intersect(names(dwls_all_cell), colnames(proximityMat)) + other_IDs <- setdiff(all_IDs, spec_IDs) + + other_ints <- all_ints["cell_type" == sel_ct]$unified_int + + other_ints <- other_ints[-which(other_ints == sel_int)] + + ## do not continue if too few cells ## + if (length(spec_IDs) < minimum_unique_cells | length( + other_IDs) < minimum_unique_cells) { + result <- NULL + } else { + result <- .do_cell_proximity_test_spot( + sel_int = sel_int, + other_ints = other_ints, + select_ind = spec_IDs, + other_ind = other_IDs, + proximityMat = proximityMat, + expr_residual = expr_residual, + diff_test = diff_test, + n_perm = n_perm, + adjust_method = adjust_method, + cores = cores, + set_seed = set_seed, + seed_number = seed_number + ) + + result[, "cell_type" := sel_ct] + result[, "int_cell_type" := int_ct] + result[, "nr_select" := length(spec_IDs)] + result[, "int_nr_select" := length(other_IDs)] + result[, "unif_int" := sel_int] + } + return(result) } @@ -926,8 +1032,9 @@ NULL #' @title findICFSpot #' @name findICFSpot -#' @description Identifies cell-to-cell Interaction Changed Features (ICF) for spots, -#' i.e. features expression residual that are different due to proximity to other cell types. +#' @description Identifies cell-to-cell Interaction Changed Features (ICF) for +#' spots, i.e. features expression residual that are different due to proximity +#' to other cell types. #' #' @param gobject A giotto object #' @param spat_unit spatial unit (e.g. 'cell') @@ -950,11 +1057,14 @@ NULL #' @param verbose be verbose #' #' @return icfObject that contains the differential feat scores -#' @details Function to calculate if features expression residual are differentially expressed in cell types -#' when they interact (approximated by physical proximity) with other cell types. -#' Feature expression residual calculated as: -#' (observed expression in spot - cell_type_proportion * average_expressed_in_cell_type) -#' The results data.table in the icfObject contains - at least - the following columns: +#' @details Function to calculate if features expression residual are +#' differentially expressed in cell types when they interact +#' (approximated by physical proximity) with other cell types. +#' Feature expression residual calculated as: +#' (observed expression in spot - cell_type_proportion * +#' average_expressed_in_cell_type) +#' The results data.table in the icfObject contains - at least - +#' the following columns: #' \itemize{ #' \item{features:}{ All or selected list of tested features} #' \item{sel:}{ average feature expression residual in the interacting cells from the target cell type } @@ -972,141 +1082,153 @@ NULL #' } #' @export findICFSpot <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - ave_celltype_exp, - selected_features = NULL, - spatial_network_name = 'Delaunay_network', - minimum_unique_cells = 5, - minimum_unique_int_cells = 5, - CCI_cell_score = 0.1, - dwls_cutoff = 0.001, - diff_test = 'permutation', - nr_permutations = 100, - adjust_method = 'fdr', - do_parallel = TRUE, - cores = NA, - set_seed = TRUE, - seed_number = 1234, - verbose = FALSE) { - - # data.table variables - unified_int = NULL - - # expression data - values = match.arg(expression_values, choices = c('normalized', 'scaled', 'custom')) - features_overlap = intersect(slot(gobject, "feat_ID")[[feat_type]], rownames(ave_celltype_exp)) - ave_celltype_exp_sel = ave_celltype_exp[features_overlap,] - expr_residual = .cal_expr_residual(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - ave_celltype_exp = ave_celltype_exp_sel) - - ## test selected features ## - if(!is.null(selected_features)) { - expr_residual = expr_residual[rownames(expr_residual) %in% selected_features, ] - } - - # compute cell proximity for each spot - proximityMat = cellProximityEnrichmentEachSpot(gobject = gobject, - spatial_network_name = spatial_network_name) - # select overlapped spots - #intersect_cell_IDs = intersect(colnames(expr_residual), colnames(proximityMat)) - #expr_residual = expr_residual[, intersect_cell_IDs] - #proximityMat = proximityMat[, intersect_cell_IDs] - - # compute correlation between features and cell-types to find ICFs - all_ints = data.table::data.table(unified_int = rownames(proximityMat)) - all_ints[, cell_type := strsplit(as.character(unified_int), '--')[[1]][1], by = 1:nrow(all_ints)] - all_ints[, int_cell_type := strsplit(as.character(unified_int), '--')[[1]][2], by = 1:nrow(all_ints)] - - # exact spatial_enrichment matrix - dwls_values = get_spatial_enrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table') - data.table::setDF(dwls_values) - rownames_dwls = dwls_values[,'cell_ID'] - dwls_values = as.matrix(dwls_values[,-1]) - rownames(dwls_values) = rownames_dwls - - - if(do_parallel == TRUE) { - - fin_result = lapply_flex(X = all_ints$unified_int, cores = cores, fun = function(x) { - - tempres = .findICF_per_interaction_spot(sel_int = x, - all_ints = all_ints, - proximityMat = proximityMat, - expr_residual = expr_residual, - dwls_values = dwls_values, - dwls_cutoff = dwls_cutoff, - CCI_cell_score = CCI_cell_score, - minimum_unique_cells = minimum_unique_cells, - minimum_unique_int_cells = minimum_unique_int_cells, - n_perm = nr_permutations, - adjust_method = adjust_method, - cores = cores, - set_seed = set_seed, - seed_number = seed_number) - }) - - - } else { - - fin_result = list() - - for(i in seq_along(all_ints$unified_int)) { - - x = all_ints$unified_int[i] - - tempres = .findICF_per_interaction_spot(sel_int = x, - all_ints = all_ints, - proximityMat = proximityMat, - expr_residual = expr_residual, - dwls_values = dwls_values, - dwls_cutoff = dwls_cutoff, - CCI_cell_score = CCI_cell_score, - minimum_unique_cells = minimum_unique_cells, - minimum_unique_int_cells = minimum_unique_int_cells, - n_perm = nr_permutations, - adjust_method = adjust_method, - cores = 2, - set_seed = set_seed, - seed_number = seed_number) - - fin_result[[i]] = tempres + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + ave_celltype_exp, + selected_features = NULL, + spatial_network_name = "Delaunay_network", + minimum_unique_cells = 5, + minimum_unique_int_cells = 5, + CCI_cell_score = 0.1, + dwls_cutoff = 0.001, + diff_test = "permutation", + nr_permutations = 100, + adjust_method = "fdr", + do_parallel = TRUE, + cores = NA, + set_seed = TRUE, + seed_number = 1234, + verbose = FALSE) { + # data.table variables + unified_int <- NULL + + # expression data + values <- match.arg( + expression_values, choices = c("normalized", "scaled", "custom")) + features_overlap <- intersect( + slot(gobject, "feat_ID")[[feat_type]], rownames(ave_celltype_exp)) + ave_celltype_exp_sel <- ave_celltype_exp[features_overlap, ] + expr_residual <- .cal_expr_residual( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + ave_celltype_exp = ave_celltype_exp_sel + ) + ## test selected features ## + if (!is.null(selected_features)) { + expr_residual <- expr_residual[ + rownames(expr_residual) %in% selected_features, ] } - } - - final_result = do.call('rbind', fin_result) + # compute cell proximity for each spot + proximityMat <- cellProximityEnrichmentEachSpot( + gobject = gobject, + spatial_network_name = spatial_network_name + ) + # select overlapped spots + + # compute correlation between features and cell-types to find ICFs + all_ints <- data.table::data.table(unified_int = rownames(proximityMat)) + all_ints[, cell_type := strsplit( + as.character(unified_int), "--")[[1]][1], by = 1:nrow(all_ints)] + all_ints[, int_cell_type := strsplit( + as.character(unified_int), "--")[[1]][2], by = 1:nrow(all_ints)] + + # exact spatial_enrichment matrix + dwls_values <- getSpatialEnrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = "DWLS", + output = "data.table" + ) + data.table::setDF(dwls_values) + rownames_dwls <- dwls_values[, "cell_ID"] + dwls_values <- as.matrix(dwls_values[, -1]) + rownames(dwls_values) <- rownames_dwls + + + if (do_parallel == TRUE) { + fin_result <- lapply_flex( + X = all_ints$unified_int, cores = cores, fun = function(x) { + tempres <- .findICF_per_interaction_spot( + sel_int = x, + all_ints = all_ints, + proximityMat = proximityMat, + expr_residual = expr_residual, + dwls_values = dwls_values, + dwls_cutoff = dwls_cutoff, + CCI_cell_score = CCI_cell_score, + minimum_unique_cells = minimum_unique_cells, + minimum_unique_int_cells = minimum_unique_int_cells, + n_perm = nr_permutations, + adjust_method = adjust_method, + cores = cores, + set_seed = set_seed, + seed_number = seed_number + ) + }) + } else { + fin_result <- list() + + for (i in seq_along(all_ints$unified_int)) { + x <- all_ints$unified_int[i] + + tempres <- .findICF_per_interaction_spot( + sel_int = x, + all_ints = all_ints, + proximityMat = proximityMat, + expr_residual = expr_residual, + dwls_values = dwls_values, + dwls_cutoff = dwls_cutoff, + CCI_cell_score = CCI_cell_score, + minimum_unique_cells = minimum_unique_cells, + minimum_unique_int_cells = minimum_unique_int_cells, + n_perm = nr_permutations, + adjust_method = adjust_method, + cores = 2, + set_seed = set_seed, + seed_number = seed_number + ) + + fin_result[[i]] <- tempres + } + } - # data.table variables - spec_int = cell_type = int_cell_type = type_int = NULL + final_result <- do.call("rbind", fin_result) - final_result[, spec_int := paste0(cell_type,'--',int_cell_type)] - final_result[, type_int := ifelse(cell_type == int_cell_type, 'homo', 'hetero')] + # data.table variables + spec_int <- cell_type <- int_cell_type <- type_int <- NULL + final_result[, spec_int := paste0(cell_type, "--", int_cell_type)] + final_result[, type_int := ifelse( + cell_type == int_cell_type, "homo", "hetero")] - #return(final_result) - permutation_test = ifelse(diff_test == 'permutation', nr_permutations, 'no permutations') + # return(final_result) - icfObject = list(ICFscores = final_result, - Giotto_info = list('values' = values, - 'cluster' = 'cell_ID', - 'spatial network' = spatial_network_name), - test_info = list('test' = diff_test, - 'p.adj' = adjust_method, - 'min cells' = minimum_unique_cells, - 'min interacting cells' = minimum_unique_int_cells, - 'perm' = permutation_test)) - class(icfObject) = append(class(icfObject), 'icfObject') - return(icfObject) + permutation_test <- ifelse( + diff_test == "permutation", nr_permutations, "no permutations") + icfObject <- list( + ICFscores = final_result, + Giotto_info = list( + "values" = values, + "cluster" = "cell_ID", + "spatial network" = spatial_network_name + ), + test_info = list( + "test" = diff_test, + "p.adj" = adjust_method, + "min cells" = minimum_unique_cells, + "min interacting cells" = minimum_unique_int_cells, + "perm" = permutation_test + ) + ) + class(icfObject) <- append(class(icfObject), "icfObject") + return(icfObject) } @@ -1116,9 +1238,11 @@ findICFSpot <- function(gobject, #' #' @param icfObject ICF (interaction changed feature) score object #' @param min_cells minimum number of source cell type -#' @param min_cells_expr_resi minimum expression residual level for source cell type +#' @param min_cells_expr_resi minimum expression residual level for source +#' cell type #' @param min_int_cells minimum number of interacting neighbor cell type -#' @param min_int_cells_expr_resi minimum expression residual level for interacting neighbor cell type +#' @param min_int_cells_expr_resi minimum expression residual level for +#' interacting neighbor cell type #' @param min_fdr minimum adjusted p-value #' @param min_pcc_diff minimum absolute pcc difference difference #' @param min_zscore minimum z-score change @@ -1127,62 +1251,68 @@ findICFSpot <- function(gobject, #' #' @return icfObject that contains the filtered differential feature scores #' @export -filterICFSpot = function(icfObject, - min_cells = 4, - min_cells_expr_resi = 0.05, - min_int_cells = 4, - min_int_cells_expr_resi = 0.05, - min_fdr = 0.5, - min_pcc_diff = 0.05, - min_zscore = 0.05, - zscores_column = c('cell_type', 'features'), - direction = c('both', 'up', 'down')) { - - # data.table variables - nr_select = int_nr_select = zscores = pcc_diff = sel = other = p.adj = NULL - log2fc = min_log2_fc = NULL - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - zscores_column = match.arg(zscores_column, choices = c('cell_type', 'features')) +filterICFSpot <- function(icfObject, + min_cells = 4, + min_cells_expr_resi = 0.05, + min_int_cells = 4, + min_int_cells_expr_resi = 0.05, + min_fdr = 0.5, + min_pcc_diff = 0.05, + min_zscore = 0.05, + zscores_column = c("cell_type", "features"), + direction = c("both", "up", "down")) { + # data.table variables + nr_select <- int_nr_select <- zscores <- pcc_diff <- sel <- other <- + p.adj <- NULL + log2fc <- min_log2_fc <- NULL + + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") + } - ICFscore = copy(icfObject[['ICFscores']]) + zscores_column <- match.arg( + zscores_column, choices = c("cell_type", "features")) - # other parameters - direction = match.arg(direction, choices = c('both', 'up', 'down')) + ICFscore <- copy(icfObject[["ICFscores"]]) + # other parameters + direction <- match.arg(direction, choices = c("both", "up", "down")) - ## sequential filter steps ## - # 1. minimum number of source and target cells - selection_scores = ICFscore[nr_select >= min_cells & int_nr_select >= min_int_cells] - # 2. create z-scores for log2fc per cell type - selection_scores[, zscores := scale(pcc_diff), by = c(zscores_column)] + ## sequential filter steps ## + # 1. minimum number of source and target cells + selection_scores <- ICFscore[ + nr_select >= min_cells & int_nr_select >= min_int_cells] - # 3. filter based on z-scores and minimum levels - comb_DT = rbind(selection_scores[zscores >= min_zscore & abs(pcc_diff) >= min_pcc_diff & sel >= min_cells_expr_resi], - selection_scores[zscores <= -min_zscore & abs(pcc_diff) >= min_pcc_diff & other >= min_int_cells_expr_resi]) + # 2. create z-scores for log2fc per cell type + selection_scores[, zscores := scale(pcc_diff), by = c(zscores_column)] - # 4. filter based on adjusted p-value (fdr) - comb_DT = comb_DT[p.adj < min_fdr] + # 3. filter based on z-scores and minimum levels + comb_DT <- rbind( + selection_scores[zscores >= min_zscore & abs( + pcc_diff) >= min_pcc_diff & sel >= min_cells_expr_resi], + selection_scores[zscores <= -min_zscore & abs( + pcc_diff) >= min_pcc_diff & other >= min_int_cells_expr_resi] + ) + # 4. filter based on adjusted p-value (fdr) + comb_DT <- comb_DT[p.adj < min_fdr] - if(direction == 'both') { - selection_scores = selection_scores - } else if(direction == 'up') { - selection_scores = selection_scores[log2fc >= min_log2_fc] - } else if(direction == 'down') { - selection_scores = selection_scores[log2fc <= -min_log2_fc] - } + if (direction == "both") { + selection_scores <- selection_scores + } else if (direction == "up") { + selection_scores <- selection_scores[log2fc >= min_log2_fc] + } else if (direction == "down") { + selection_scores <- selection_scores[log2fc <= -min_log2_fc] + } - newobj = copy(icfObject) - newobj[['ICFscores']] = comb_DT - return(newobj) + newobj <- copy(icfObject) + newobj[["ICFscores"]] <- comb_DT + return(newobj) } #' @title plotICFSpot @@ -1198,75 +1328,87 @@ filterICFSpot = function(icfObject, #' @return plot #' @export plotICFSpot <- function(gobject, - icfObject, - source_type, - source_markers, - ICF_features, - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotICFSpot') { - - - # data.table variables - cell_type = int_cell_type = pcc_diff = NULL - - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - ICFscores = icfObject[['ICFscores']] - - # combine features - names(source_markers) = rep('marker', length(source_markers)) - neighbor_types = names(ICF_features) - all_features = c(source_markers, ICF_features) - - # warning if there are features selected that are not detected - detected_features = unique(ICFscores[['features']]) - not_detected_features = all_features[!all_features %in% detected_features] - if(length(not_detected_features) > 0) { - cat('These selected features are not in the icfObject: \n', - not_detected_features, '\n') - } - - # data.table set column names - features = group = NULL - - tempDT = ICFscores[features %in% all_features][cell_type == source_type][int_cell_type %in% neighbor_types] - tempDT[, features := factor(features, levels = all_features)] - tempDT[, group := names(all_features[all_features == features]), by = 1:nrow(tempDT)] - - - if(is.null(cell_color_code)) { - mycolors = set_default_color_discrete_cell(instrs = instructions(gobject))(n = length(unique(tempDT$int_cell_type))) - names(mycolors) = unique(tempDT$int_cell_type) - } else { - mycolors = cell_color_code - } - - - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = ggplot2::element_text(size = 14, angle = 45, vjust = 1, hjust = 1), - axis.text.y = ggplot2::element_text(size = 14), - axis.title = ggplot2::element_text(size = 14)) - pl = pl + ggplot2::geom_bar(data = tempDT, ggplot2::aes(x = features, y = pcc_diff, fill = int_cell_type), stat = 'identity', position = ggplot2::position_dodge()) - pl = pl + ggplot2::scale_fill_manual(values = mycolors) - pl = pl + ggplot2::labs(x = '', title = paste0('fold-change z-scores in ' ,source_type)) - - return(plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + icfObject, + source_type, + source_markers, + ICF_features, + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotICFSpot") { + # data.table variables + cell_type <- int_cell_type <- pcc_diff <- NULL + + + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") + } + + ICFscores <- icfObject[["ICFscores"]] + + # combine features + names(source_markers) <- rep("marker", length(source_markers)) + neighbor_types <- names(ICF_features) + all_features <- c(source_markers, ICF_features) + + # warning if there are features selected that are not detected + detected_features <- unique(ICFscores[["features"]]) + not_detected_features <- all_features[!all_features %in% detected_features] + if (length(not_detected_features) > 0) { + cat( + "These selected features are not in the icfObject: \n", + not_detected_features, "\n" + ) + } + + # data.table set column names + features <- group <- NULL + + tempDT <- ICFscores[features %in% all_features][ + cell_type == source_type][int_cell_type %in% neighbor_types] + tempDT[, features := factor(features, levels = all_features)] + tempDT[, group := names(all_features[ + all_features == features]), by = 1:nrow(tempDT)] + + + if (is.null(cell_color_code)) { + mycolors <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = length(unique( + tempDT$int_cell_type))) + names(mycolors) <- unique(tempDT$int_cell_type) + } else { + mycolors <- cell_color_code + } + + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = ggplot2::element_text( + size = 14, angle = 45, vjust = 1, hjust = 1), + axis.text.y = ggplot2::element_text(size = 14), + axis.title = ggplot2::element_text(size = 14) + ) + pl <- pl + ggplot2::geom_bar( + data = tempDT, + ggplot2::aes(x = features, y = pcc_diff, fill = int_cell_type), + stat = "identity", position = ggplot2::position_dodge()) + pl <- pl + ggplot2::scale_fill_manual(values = mycolors) + pl <- pl + ggplot2::labs(x = "", title = paste0( + "fold-change z-scores in ", source_type)) + + return(plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title plotCellProximityFeatSpot @@ -1288,247 +1430,296 @@ plotICFSpot <- function(gobject, #' @param cell_color_code vector of colors with cell types as names #' @return plot #' @export -plotCellProximityFeatSpot = function(gobject, - icfObject, - method = c('volcano', 'cell_barplot', 'cell-cell', 'cell_sankey', 'heatmap', 'dotplot'), - min_cells = 4, - min_cells_expr_resi = 0.05, - min_int_cells = 4, - min_int_cells_expr_resi = 0.05, - min_fdr = 0.5, - min_pcc_diff = 0.05, - min_zscore = 0.05, - zscores_column = c('cell_type', 'features'), - direction = c('both', 'up', 'down'), - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCellProximityFeats') { - - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - # print, return and save parameters - show_plot = ifelse(is.na(show_plot), readGiottoInstructions(gobject, param = 'show_plot'), show_plot) - save_plot = ifelse(is.na(save_plot), readGiottoInstructions(gobject, param = 'save_plot'), save_plot) - return_plot = ifelse(is.na(return_plot), readGiottoInstructions(gobject, param = 'return_plot'), return_plot) - - - ## first filter - filter_icf = filterICFSpot(icfObject, - min_cells = min_cells, - min_cells_expr_resi = min_cells_expr_resi, - min_int_cells = min_int_cells, - min_int_cells_expr_resi = min_int_cells_expr_resi, - min_fdr = min_fdr, - min_pcc_diff = min_pcc_diff, - min_zscore = min_zscore, - zscores_column = c('cell_type', 'features'), - direction = c('both', 'up', 'down')) - - complete_part = filter_icf[['ICFscores']] - - ## other parameters - method = match.arg(method, choices = c('volcano', 'cell_barplot', 'cell-cell', 'cell_sankey', 'heatmap', 'dotplot')) - - - # variables - pcc_diff = p.adj = unif_int = N = cell_type = int_cell_type = NULL - - ## create data.table for visualization - if(method == 'volcano') { - - ## volcanoplot - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_point(data = complete_part, ggplot2::aes(x = pcc_diff, y = ifelse(is.infinite(-log10(p.adj)), 1000, -log10(p.adj)))) - pl = pl + ggplot2::theme_classic() - pl = pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) - pl = pl + ggplot2::labs(x = 'pcc diff', y = '-log10(p.adjusted)') - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - - } else if(method == 'cell-cell') { - - nr_int_selection_scores = complete_part[, .N, by = unif_int] - order_interactions = nr_int_selection_scores[order(N)]$unif_int - - complete_part[, unif_int := factor(unif_int, order_interactions)] - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::geom_bar(data = complete_part, ggplot2::aes(x = unif_int, fill = unif_int)) - pl <- pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 1)) - pl <- pl + ggplot2::coord_flip() - - ## print plot - if(show_plot == TRUE) { - print(pl) +plotCellProximityFeatSpot <- function(gobject, + icfObject, + method = c("volcano", "cell_barplot", "cell-cell", "cell_sankey", + "heatmap", "dotplot"), + min_cells = 4, + min_cells_expr_resi = 0.05, + min_int_cells = 4, + min_int_cells_expr_resi = 0.05, + min_fdr = 0.5, + min_pcc_diff = 0.05, + min_zscore = 0.05, + zscores_column = c("cell_type", "features"), + direction = c("both", "up", "down"), + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCellProximityFeats") { + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") } - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - - } else if(method == 'cell_barplot') { - - - # by source cell type plot - nr_source_selection_scores = complete_part[, .N, by = cell_type] - order_source = nr_source_selection_scores[order(N)]$cell_type - - complete_part[, cell_type := factor(cell_type, order_source)] - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::geom_bar(data = complete_part, ggplot2::aes(x = cell_type, fill = int_cell_type)) - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } - pl <- pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) - pl <- pl + ggplot2::labs(x = '', y = '# of features influenced by cell neighborhood') - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'cell_sankey') { - - - # package check for ggalluvial - package_check(pkg_name = 'ggalluvial', repository = 'CRAN') - - - testalluv = complete_part[, .N, by = c('int_cell_type', 'cell_type')] - - # library(ggalluvial) # this is needed for it to work, why?? - # maybe use requireNamespace() instead? - - pl <- ggplot2::ggplot(testalluv, - ggplot2::aes(y = N, axis1 = cell_type, axis2 = int_cell_type)) + - ggalluvial::geom_alluvium(aes(fill = cell_type), width = 1/12) + - ggalluvial::geom_stratum(width = 1/12, fill = "black", color = "grey") + - ggplot2::scale_x_discrete(limits = c("cell type", "neighbours"), expand = c(.05, .05)) + - ggplot2::geom_label(stat = "stratum", label.strata = TRUE, size = 3) + - ggplot2::theme_classic() + ggplot2::labs(x = '', y = '# of features influenced by cell neighborhood') - - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } - - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'dotplot') { - - changed_features = complete_part[, .N, by = c('cell_type', 'int_cell_type')] - - changed_features[, cell_type := factor(cell_type, unique(cell_type))] - changed_features[, int_cell_type := factor(int_cell_type, unique(int_cell_type))] - - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() - pl = pl + ggplot2::geom_point(data = changed_features, ggplot2::aes(x = cell_type, y = int_cell_type, size = N)) - pl = pl + ggplot2::scale_size_continuous(guide=guide_legend(title = '# of ICFs')) - pl = pl + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 1, hjust = 1)) - pl = pl + ggplot2::labs(x = 'source cell type', y = 'neighbor cell type') - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'heatmap') { - - changed_features = complete_part[, .N, by = c('cell_type', 'int_cell_type')] - - changed_features[, cell_type := factor(cell_type, unique(cell_type))] - changed_features[, int_cell_type := factor(int_cell_type, unique(int_cell_type))] - - changed_features_d = data.table::dcast.data.table(changed_features, cell_type~int_cell_type, value.var = 'N', fill = 0) - changed_features_m = dt_to_matrix(changed_features_d) - - col_fun = GiottoVisuals::colorRamp2(breaks = stats::quantile(log2(changed_features_m+1)), - colors = c("white", 'white', "blue", "yellow", "red")) - - heatm = ComplexHeatmap::Heatmap(as.matrix(log2(changed_features_m+1)), col = col_fun, - row_title = 'cell_type', column_title = 'int_cell_type', heatmap_legend_param = list(title = 'log2(# DEGs)')) - - ## print plot - if(show_plot == TRUE) { - print(heatm) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = heatm, default_save_name = default_save_name), save_param)) - } + # print, return and save parameters + show_plot <- ifelse( + is.na(show_plot), readGiottoInstructions(gobject, param = "show_plot"), + show_plot) + save_plot <- ifelse( + is.na(save_plot), readGiottoInstructions(gobject, param = "save_plot"), + save_plot) + return_plot <- ifelse( + is.na(return_plot), + readGiottoInstructions(gobject, param = "return_plot"), return_plot) + + + ## first filter + filter_icf <- filterICFSpot(icfObject, + min_cells = min_cells, + min_cells_expr_resi = min_cells_expr_resi, + min_int_cells = min_int_cells, + min_int_cells_expr_resi = min_int_cells_expr_resi, + min_fdr = min_fdr, + min_pcc_diff = min_pcc_diff, + min_zscore = min_zscore, + zscores_column = c("cell_type", "features"), + direction = c("both", "up", "down") + ) - ## return plot - if(return_plot == TRUE) { - return(heatm) + complete_part <- filter_icf[["ICFscores"]] + + ## other parameters + method <- match.arg( + method, + choices = c("volcano", "cell_barplot", "cell-cell", "cell_sankey", + "heatmap", "dotplot")) + + + # variables + pcc_diff <- p.adj <- unif_int <- N <- cell_type <- int_cell_type <- NULL + + ## create data.table for visualization + if (method == "volcano") { + ## volcanoplot + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_point( + data = complete_part, + ggplot2::aes( + x = pcc_diff, + y = ifelse(is.infinite(-log10(p.adj)), 1000, -log10(p.adj)))) + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) + pl <- pl + ggplot2::labs(x = "pcc diff", y = "-log10(p.adjusted)") + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell-cell") { + nr_int_selection_scores <- complete_part[, .N, by = unif_int] + order_interactions <- nr_int_selection_scores[order(N)]$unif_int + + complete_part[, unif_int := factor(unif_int, order_interactions)] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_bar( + data = complete_part, + ggplot2::aes(x = unif_int, fill = unif_int)) + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = ggplot2::element_text + (angle = 90, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::coord_flip() + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell_barplot") { + # by source cell type plot + nr_source_selection_scores <- complete_part[, .N, by = cell_type] + order_source <- nr_source_selection_scores[order(N)]$cell_type + + complete_part[, cell_type := factor(cell_type, order_source)] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_bar( + data = complete_part, + ggplot2::aes(x = cell_type, fill = int_cell_type)) + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::labs( + x = "", y = "# of features influenced by cell neighborhood") + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell_sankey") { + # package check for ggalluvial + package_check(pkg_name = "ggalluvial", repository = "CRAN") + + + testalluv <- complete_part[, .N, by = c("int_cell_type", "cell_type")] + + + pl <- ggplot2::ggplot( + testalluv, + ggplot2::aes(y = N, axis1 = cell_type, axis2 = int_cell_type) + ) + + ggalluvial::geom_alluvium(aes(fill = cell_type), width = 1 / 12) + + ggalluvial::geom_stratum( + width = 1 / 12, fill = "black", color = "grey") + + ggplot2::scale_x_discrete( + limits = c("cell type", "neighbours"), expand = c(.05, .05)) + + ggplot2::geom_label( + tat = "stratum", label.strata = TRUE, size = 3) + + ggplot2::theme_classic() + + ggplot2::labs( + x = "", y = "# of features influenced by cell neighborhood") + + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } + + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "dotplot") { + changed_features <- complete_part[ + , .N, by = c("cell_type", "int_cell_type")] + + changed_features[, cell_type := factor(cell_type, unique(cell_type))] + changed_features[, int_cell_type := factor( + int_cell_type, unique(int_cell_type))] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_point( + data = changed_features, + ggplot2::aes(x = cell_type, y = int_cell_type, size = N)) + pl <- pl + ggplot2::scale_size_continuous( + guide = guide_legend(title = "# of ICFs")) + pl <- pl + ggplot2::theme(axis.text.x = ggplot2::element_text( + angle = 90, vjust = 1, hjust = 1)) + pl <- pl + ggplot2::labs( + x = "source cell type", y = "neighbor cell type") + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "heatmap") { + changed_features <- complete_part[ + , .N, by = c("cell_type", "int_cell_type")] + + changed_features[, cell_type := factor(cell_type, unique(cell_type))] + changed_features[, int_cell_type := factor( + int_cell_type, unique(int_cell_type))] + + changed_features_d <- data.table::dcast.data.table( + changed_features, + cell_type ~ int_cell_type, + value.var = "N", + fill = 0) + changed_features_m <- dt_to_matrix(changed_features_d) + + col_fun <- GiottoVisuals::colorRamp2( + breaks = stats::quantile(log2(changed_features_m + 1)), + colors = c("white", "white", "blue", "yellow", "red") + ) + + heatm <- ComplexHeatmap::Heatmap(as.matrix(log2( + changed_features_m + 1)), + col = col_fun, + row_title = "cell_type", + column_title = "int_cell_type", + heatmap_legend_param = list(title = "log2(# DEGs)") + ) + + ## print plot + if (show_plot == TRUE) { + print(heatm) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = heatm, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(heatm) + } } - - } } @@ -1538,7 +1729,8 @@ plotCellProximityFeatSpot = function(gobject, #' @title Specific cell-cell communication scores for spot data #' @name .specific_CCCScores_spots -#' @description Specific Cell-Cell communication scores based on spatial expression of interacting cells at spots resolution +#' @description Specific Cell-Cell communication scores based on spatial +#' expression of interacting cells at spots resolution #' #' @param gobject giotto object to use #' @param spat_unit spatial unit (e.g. 'cell') @@ -1551,18 +1743,22 @@ plotCellProximityFeatSpot = function(gobject, #' @param cell_type_2 second cell type #' @param feature_set_1 first specific feature set from feature pairs #' @param feature_set_2 second specific feature set from feature pairs -#' @param min_observations minimum number of interactions needed to be considered -#' @param detailed provide more detailed information (random variance and z-score) +#' @param min_observations minimum number of interactions needed to be +#' considered +#' @param detailed provide more detailed information +#' (random variance and z-score) #' @param adjust_method which method to adjust p-values #' @param adjust_target adjust multiple hypotheses at the cell or feature level #' @param set_seed set a seed for reproducibility #' @param seed_number seed number #' @param verbose verbose #' -#' @return Cell-Cell communication scores for feature pairs based on spatial interaction -#' @details Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -#' are expressed at higher levels than expected based on a reshuffled null distribution -#' of feature expression values in cells that are spatially in proximity to eachother. +#' @return Cell-Cell communication scores for feature pairs based on spatial +#' interaction +#' @details Statistical framework to identify if pairs of features +#' (such as ligand-receptor combinations) are expressed at higher levels than +#' expected based on a reshuffled null distribution of feature expression +#' values in cells that are spatially in proximity to each other. #' \itemize{ #' \item{LR_comb:}{Pair of ligand and receptor} #' \item{lig_cell_type:}{ cell type to assess expression level of ligand } @@ -1585,218 +1781,233 @@ plotCellProximityFeatSpot = function(gobject, #' \item{PI:}{ significanc score: log2fc * -log10(p.adj) } #' } #' @keywords internal -.specific_CCCScores_spots = function(gobject, - spat_unit = NULL, - feat_type = NULL, - expr_residual, - dwls_values, - proximityMat, - random_iter = 1000, - cell_type_1 = 'astrocytes', - cell_type_2 = 'endothelial', - feature_set_1, - feature_set_2, - min_observations = 2, - detailed = FALSE, - adjust_method = c('fdr', 'bonferroni',' BH', 'holm', 'hochberg', 'hommel', - 'BY','none'), - adjust_target = c('features', 'cells'), - set_seed = FALSE, - seed_number = 1234, - verbose = FALSE){ - - # data.table variables - from_to = cell_ID = lig_cell_type = rec_cell_type = lig_nr = rec_nr = rand_expr = NULL - av_diff = log2fc = LR_expr = pvalue = LR_cell_comb = p.adj = LR_comb = PI = NULL - sd_diff = z_score = ligand = receptor = NULL - - # get parameters - adjust_method = match.arg(adjust_method, choices = c("fdr", "bonferroni","BH", "holm", "hochberg", "hommel", - "BY", "none")) - adjust_target = match.arg(adjust_target, choices = c('features', 'cells')) - - # select cell_ids with cell-types - cell_direction_1 = paste0(cell_type_1,'--',cell_type_2) - cell_direction_2 = paste0(cell_type_2,'--',cell_type_1) - - if(verbose) print(paste0('Processing specific CCC Scores: ', cell_direction_1)) - - proxi_1 = proximityMat[cell_direction_1,] - proxi_2 = proximityMat[cell_direction_2,] - - ct1_cell_ids = names(proxi_1[proxi_1 > 0]) - ct2_cell_ids = names(proxi_2[proxi_2 > 0]) - - # dwls value for cell types - dwls_ct1 = dwls_values[,cell_type_1] - dwls_ct2 = dwls_values[,cell_type_2] - - # make sure that there are sufficient observations - if(length(ct1_cell_ids) <= min_observations | length(ct2_cell_ids) <= min_observations) { - - return(NULL) - - } else { - - # get feature expression residual for ligand and receptor - expr_res_L = expr_residual[feature_set_1, ct1_cell_ids] - expr_res_R = expr_residual[feature_set_2, ct2_cell_ids] - - # compute Ligand value - lig_expr = t(t(expr_res_L) * dwls_ct1[ct1_cell_ids]) - rec_expr = t(t(expr_res_R) * dwls_ct2[ct2_cell_ids]) - - lig_expr = round(rowMeans(lig_expr), 7) - rec_expr = round(rowMeans(rec_expr), 7) - - comScore = data.table::data.table(LR_comb = paste0(feature_set_1, '-', feature_set_2), - lig_cell_type = rep(cell_type_1, length(feature_set_1)), - lig_expr = lig_expr, - ligand = feature_set_1, - rec_cell_type = rep(cell_type_2, length(feature_set_2)), - rec_expr = rec_expr, - receptor = feature_set_2, - lig_nr = rep(length(ct1_cell_ids), length(feature_set_1)), - rec_nr = rep(length(ct2_cell_ids), length(feature_set_1)) - ) - - comScore[, LR_expr := lig_expr + rec_expr] - comScore = comScore[, .(LR_comb, lig_cell_type, lig_expr, ligand, - rec_cell_type, rec_expr, receptor, LR_expr, lig_nr, rec_nr)] - - # prepare for randomized scores - total_av = rep(0, nrow(comScore)) - - if(detailed == FALSE) { - total_sum = rep(0, nrow(comScore)) +.specific_CCCScores_spots <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + expr_residual, + dwls_values, + proximityMat, + random_iter = 1000, + cell_type_1 = "astrocytes", + cell_type_2 = "endothelial", + feature_set_1, + feature_set_2, + min_observations = 2, + detailed = FALSE, + adjust_method = c( + "fdr", "bonferroni", " BH", "holm", "hochberg", "hommel", + "BY", "none" + ), + adjust_target = c("features", "cells"), + set_seed = FALSE, + seed_number = 1234, + verbose = FALSE) { + # data.table variables + from_to <- cell_ID <- lig_cell_type <- rec_cell_type <- lig_nr <- + rec_nr <- rand_expr <- NULL + av_diff <- log2fc <- LR_expr <- pvalue <- LR_cell_comb <- p.adj <- + LR_comb <- PI <- NULL + sd_diff <- z_score <- ligand <- receptor <- NULL + + # get parameters + adjust_method <- match.arg(adjust_method, choices = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + )) + adjust_target <- match.arg(adjust_target, choices = c("features", "cells")) + + # select cell_ids with cell-types + cell_direction_1 <- paste0(cell_type_1, "--", cell_type_2) + cell_direction_2 <- paste0(cell_type_2, "--", cell_type_1) + + if (verbose) print(paste0( + "Processing specific CCC Scores: ", cell_direction_1)) + + proxi_1 <- proximityMat[cell_direction_1, ] + proxi_2 <- proximityMat[cell_direction_2, ] + + ct1_cell_ids <- names(proxi_1[proxi_1 > 0]) + ct2_cell_ids <- names(proxi_2[proxi_2 > 0]) + + # dwls value for cell types + dwls_ct1 <- dwls_values[, cell_type_1] + dwls_ct2 <- dwls_values[, cell_type_2] + + # make sure that there are sufficient observations + if (length(ct1_cell_ids) <= min_observations | + length(ct2_cell_ids) <= min_observations) { + return(NULL) } else { - total_sum = matrix(nrow = nrow(comScore), ncol = random_iter) + # get feature expression residual for ligand and receptor + expr_res_L <- expr_residual[feature_set_1, ct1_cell_ids] + expr_res_R <- expr_residual[feature_set_2, ct2_cell_ids] + + # compute Ligand value + lig_expr <- t(t(expr_res_L) * dwls_ct1[ct1_cell_ids]) + rec_expr <- t(t(expr_res_R) * dwls_ct2[ct2_cell_ids]) + + lig_expr <- round(rowMeans(lig_expr), 7) + rec_expr <- round(rowMeans(rec_expr), 7) + + comScore <- data.table::data.table( + LR_comb = paste0(feature_set_1, "-", feature_set_2), + lig_cell_type = rep(cell_type_1, length(feature_set_1)), + lig_expr = lig_expr, + ligand = feature_set_1, + rec_cell_type = rep(cell_type_2, length(feature_set_2)), + rec_expr = rec_expr, + receptor = feature_set_2, + lig_nr = rep(length(ct1_cell_ids), length(feature_set_1)), + rec_nr = rep(length(ct2_cell_ids), length(feature_set_1)) + ) + + comScore[, LR_expr := lig_expr + rec_expr] + comScore <- comScore[, .( + LR_comb, lig_cell_type, lig_expr, ligand, + rec_cell_type, rec_expr, receptor, LR_expr, lig_nr, rec_nr + )] + + # prepare for randomized scores + total_av <- rep(0, nrow(comScore)) + + if (detailed == FALSE) { + total_sum <- rep(0, nrow(comScore)) + } else { + total_sum <- matrix(nrow = nrow(comScore), ncol = random_iter) + } + + total_bool <- rep(0, nrow(comScore)) + + + all_cell_ids <- colnames(expr_residual) + + ## simulations ## + for (sim in 1:random_iter) { + if (verbose == TRUE) cat("simulation ", sim, "\n") + + # get random ids and subset + if (set_seed == TRUE) { + seed_number <- seed_number + sim + set.seed(seed = seed_number) + } + + random_ids_1 <- sample( + all_cell_ids, size = length(ct1_cell_ids), replace = FALSE) + random_ids_2 <- sample( + all_cell_ids, size = length(ct2_cell_ids), replace = FALSE) + + # get feature expression residual for ligand and receptor + random_expr_res_L <- expr_residual[feature_set_1, random_ids_1] + random_expr_res_R <- expr_residual[feature_set_2, random_ids_2] + + # compute Ligand value + random_lig_expr <- t(t(random_expr_res_L) * dwls_ct1[random_ids_1]) + random_rec_expr <- t(t(random_expr_res_R) * dwls_ct2[random_ids_2]) + + random_lig_expr <- round(rowMeans(random_lig_expr), 7) + random_rec_expr <- round(rowMeans(random_rec_expr), 7) + + randomScore <- data.table::data.table( + lig_expr = random_lig_expr, + rec_expr = random_rec_expr + ) + randomScore <- randomScore[, LR_expr := lig_expr + rec_expr] + + # average random score + total_av <- total_av + randomScore[["LR_expr"]] + + # difference between observed and random + difference <- comScore[["LR_expr"]] - randomScore[["LR_expr"]] + + # calculate total difference + if (detailed == FALSE) { + total_sum <- total_sum + difference + } else { + total_sum[, sim] <- difference + } + + # calculate p-values + difference[difference > 0] <- 1 + difference[difference < 0] <- -1 + total_bool <- total_bool + difference + } + + comScore[, rand_expr := total_av / random_iter] + + if (detailed == TRUE) { + av_difference_scores <- rowMeans_flex(total_sum) + sd_difference_scores <- apply( + total_sum, MARGIN = 1, FUN = stats::sd) + + comScore[, av_diff := av_difference_scores] + comScore[, sd_diff := sd_difference_scores] + comScore[, z_score := (LR_expr - rand_expr) / sd_diff] + } else { + comScore[, av_diff := total_sum / random_iter] + } + + comScore[, log2fc := LR_expr - rand_expr] + comScore[, pvalue := total_bool / random_iter] + comScore[, pvalue := ifelse(pvalue > 0, 1 - pvalue, 1 + pvalue)] + comScore[, LR_cell_comb := paste0(lig_cell_type, "--", rec_cell_type)] + + if (adjust_target == "features") { + comScore[, p.adj := stats::p.adjust( + pvalue, method = adjust_method), by = .(LR_cell_comb)] + } else if (adjust_target == "cells") { + comScore[, p.adj := stats::p.adjust( + pvalue, method = adjust_method), by = .(LR_comb)] + } + + # get minimum adjusted p.value that is not zero + all_p.adj <- comScore[["p.adj"]] + lowest_p.adj <- min(all_p.adj[all_p.adj != 0]) + comScore[, PI := ifelse( + p.adj == 0, + log2fc * (-log10(lowest_p.adj)), + log2fc * (-log10(p.adj)))] + + return(comScore) } - - total_bool = rep(0, nrow(comScore)) - - # all_cell_ids = pDataDT(gobject = gobject, - # spat_unit = spat_unit, - # feat_type = feat_type)$cell_ID - - all_cell_ids = colnames(expr_residual) - - ## simulations ## - for(sim in 1:random_iter) { - - if(verbose == TRUE) cat('simulation ', sim, '\n') - - # get random ids and subset - if(set_seed == TRUE) { - seed_number = seed_number+sim - set.seed(seed = seed_number) - } - #random_ids_1 = all_cell_ids[sample(length(all_cell_ids), size = length(ct1_cell_ids))] - #random_ids_2 = all_cell_ids[sample(length(all_cell_ids), size = length(ct2_cell_ids))] - - random_ids_1 = sample(all_cell_ids, size = length(ct1_cell_ids), replace = FALSE) - random_ids_2 = sample(all_cell_ids, size = length(ct2_cell_ids), replace = FALSE) - - # get feature expression residual for ligand and receptor - random_expr_res_L = expr_residual[feature_set_1, random_ids_1] - random_expr_res_R = expr_residual[feature_set_2, random_ids_2] - - # compute Ligand value - random_lig_expr = t(t(random_expr_res_L) * dwls_ct1[random_ids_1]) - random_rec_expr = t(t(random_expr_res_R) * dwls_ct2[random_ids_2]) - - random_lig_expr = round(rowMeans(random_lig_expr), 7) - random_rec_expr = round(rowMeans(random_rec_expr), 7) - - randomScore = data.table::data.table(lig_expr = random_lig_expr, - rec_expr = random_rec_expr) - randomScore = randomScore[, LR_expr := lig_expr + rec_expr] - - # average random score - total_av = total_av + randomScore[['LR_expr']] - - # difference between observed and random - difference = comScore[['LR_expr']] - randomScore[['LR_expr']] - - # calculate total difference - if(detailed == FALSE) { - total_sum = total_sum+difference - } else { - total_sum[,sim] = difference - } - - # calculate p-values - difference[difference > 0] = 1 - difference[difference < 0] = -1 - total_bool = total_bool + difference - - } - - comScore[, rand_expr := total_av/random_iter] - - if(detailed == TRUE) { - av_difference_scores = rowMeans_flex(total_sum) - sd_difference_scores = apply(total_sum, MARGIN = 1, FUN = stats::sd) - - comScore[, av_diff := av_difference_scores] - comScore[, sd_diff := sd_difference_scores] - comScore[, z_score := (LR_expr - rand_expr)/sd_diff] - - } else { - comScore[, av_diff := total_sum/random_iter] - } - - comScore[, log2fc := LR_expr - rand_expr] - comScore[, pvalue := total_bool/random_iter] - comScore[, pvalue := ifelse(pvalue > 0, 1-pvalue, 1+pvalue)] - comScore[, LR_cell_comb := paste0(lig_cell_type,'--',rec_cell_type)] - - if(adjust_target == 'features') { - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_cell_comb)] - } else if(adjust_target == 'cells'){ - comScore[, p.adj := stats::p.adjust(pvalue, method = adjust_method), by = .(LR_comb)] - } - - # get minimum adjusted p.value that is not zero - all_p.adj = comScore[['p.adj']] - lowest_p.adj = min(all_p.adj[all_p.adj != 0]) - comScore[, PI := ifelse(p.adj == 0, log2fc*(-log10(lowest_p.adj)), log2fc*(-log10(p.adj)))] - - return(comScore) - - } } #' @title spatCellCellcomSpots #' @name spatCellCellcomSpots -#' @description Spatial Cell-Cell communication scores based on spatial expression of interacting cells at spots resolution +#' @description Spatial Cell-Cell communication scores based on spatial +#' expression of interacting cells at spots resolution #' #' @param gobject giotto object to use #' @param spat_unit spatial unit (e.g. 'cell') #' @param feat_type feature type (e.g. 'rna') #' @param ave_celltype_exp Matrix with average expression per cell type #' @param expression_values (e.g. 'normalized', 'scaled', 'custom') -#' @param spatial_network_name spatial network to use for identifying interacting cells +#' @param spatial_network_name spatial network to use for identifying +#' interacting cells #' @param cluster_column cluster column with cell type information #' @param random_iter number of iterations #' @param feature_set_1 first specific feature set from feature pairs #' @param feature_set_2 second specific feature set from feature pairs -#' @param min_observations minimum number of interactions needed to be considered -#' @param detailed provide more detailed information (random variance and z-score) +#' @param min_observations minimum number of interactions needed to be +#' considered +#' @param detailed provide more detailed information +#' (random variance and z-score) #' @param adjust_method which method to adjust p-values -#' @param adjust_target adjust multiple hypotheses at the cell or feature level +#' @param adjust_target adjust multiple hypotheses at the cell or feature +#' level #' @param do_parallel run calculations in parallel with mclapply #' @param cores number of cores to use if do_parallel = TRUE #' @param set_seed set a seed for reproducibility #' @param seed_number seed number #' @param verbose verbose (e.g. 'a little', 'a lot', 'none') #' -#' @return Cell-Cell communication scores for feature pairs based on spatial interaction -#' @details Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -#' are expressed at higher levels than expected based on a reshuffled null distribution -#' of feature expression values in cells that are spatially in proximity to eachother.. +#' @return Cell-Cell communication scores for feature pairs based on spatial +#' interaction +#' @details Statistical framework to identify if pairs of features +#' (such as ligand-receptor combinations) are expressed at higher levels than +#' expected based on a reshuffled null distribution of feature expression +#' values in cells that are spatially in proximity to each other. #' \itemize{ #' \item{LR_comb:}{Pair of ligand and receptor} #' \item{lig_cell_type:}{ cell type to assess expression level of ligand } @@ -1819,175 +2030,183 @@ plotCellProximityFeatSpot = function(gobject, #' \item{PI:}{ significanc score: log2fc * -log10(p.adj) } #' } #' @export -spatCellCellcomSpots = function(gobject, - spat_unit = NULL, - feat_type = NULL, - ave_celltype_exp, - spatial_network_name = 'Delaunay_network', - cluster_column = 'cell_ID', - random_iter = 1000, - feature_set_1, - feature_set_2, - min_observations = 2, - expression_values = c('normalized', 'scaled', 'custom'), - detailed = FALSE, - adjust_method = c('fdr', 'bonferroni', 'BH', 'holm', 'hochberg', 'hommel', - 'BY', 'none'), - adjust_target = c('features', 'cells'), - do_parallel = TRUE, - cores = NA, - set_seed = TRUE, - seed_number = 1234, - verbose = c('a little', 'a lot', 'none')){ - - # data.table vars - V1 = V2 = LR_cell_comb = NULL - - # code start - verbose = match.arg(verbose, choices = c('a little', 'a lot', 'none')) - if(verbose %in% c('a little', 'none')) { - specific_verbose = F +spatCellCellcomSpots <- function(gobject, + spat_unit = NULL, + feat_type = NULL, + ave_celltype_exp, + spatial_network_name = "Delaunay_network", + cluster_column = "cell_ID", + random_iter = 1000, + feature_set_1, + feature_set_2, + min_observations = 2, + expression_values = c("normalized", "scaled", "custom"), + detailed = FALSE, + adjust_method = c( + "fdr", "bonferroni", "BH", "holm", "hochberg", "hommel", + "BY", "none" + ), + adjust_target = c("features", "cells"), + do_parallel = TRUE, + cores = NA, + set_seed = TRUE, + seed_number = 1234, + verbose = c("a little", "a lot", "none")) { + # data.table vars + V1 <- V2 <- LR_cell_comb <- NULL + + # code start + verbose <- match.arg(verbose, choices = c("a little", "a lot", "none")) + if (verbose %in% c("a little", "none")) { + specific_verbose <- FALSE } else { - specific_verbose = T - } - - ## check if spatial network exists ## - spat_networks = names(gobject@spatial_network[[spat_unit]]) - - if(!spatial_network_name %in% spat_networks) { - stop(spatial_network_name, ' is not an existing spatial network \n', - 'use showGiottoSpatNetworks() to see the available networks \n', - 'or create a new spatial network with createSpatialNetwork() \n') - } - - - # expression data - values = match.arg(expression_values, choices = c('normalized', 'scaled', 'custom')) - expr_residual = .cal_expr_residual(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - ave_celltype_exp = ave_celltype_exp) - - # compute cell proximity for each spot - proximityMat = cellProximityEnrichmentEachSpot(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - spatial_network_name = spatial_network_name) - - # select overlapped spots - intersect_cell_IDs = intersect(colnames(expr_residual), colnames(proximityMat)) - expr_residual = expr_residual[, intersect_cell_IDs] - proximityMat = proximityMat[, intersect_cell_IDs] - - # exact spatial_enrichment matrix - dwls_values = get_spatial_enrichment(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table') - data.table::setDF(dwls_values) - rownames_dwls = dwls_values[,'cell_ID'] - dwls_values = as.matrix(dwls_values[,-1]) - rownames(dwls_values) = rownames_dwls - - # check feature list - LR_comb = data.table::data.table(ligand = feature_set_1, receptor = feature_set_2) - # check LR pair not captured in giotto object - LR_out = LR_comb[!LR_comb$ligand %in% rownames(expr_residual) | !LR_comb$receptor %in% rownames(expr_residual)] - - if (dim(LR_out)[1] > 0){ - print('Ligand or receptor were removed after computing expresion residual.') - print(LR_out) - LR_comb = LR_comb[LR_comb$ligand %in% rownames(expr_residual) & LR_comb$receptor %in% rownames(expr_residual) ] - feature_set_1 = LR_comb$ligand - feature_set_2 = LR_comb$receptor - } - - ## get all combinations between cell types - combn_DT = data.table::data.table(LR_cell_comb = rownames(proximityMat)) - combn_DT[, V1 := strsplit(LR_cell_comb, '--')[[1]][1], by = 1:nrow(combn_DT)] - combn_DT[, V2 := strsplit(LR_cell_comb, '--')[[1]][2], by = 1:nrow(combn_DT)] - - ## parallel option ## - if(do_parallel == TRUE) { - - - savelist = lapply_flex(X = 1:nrow(combn_DT), cores = cores, fun = function(row) { - - cell_type_1 = combn_DT[row][['V1']] - cell_type_2 = combn_DT[row][['V2']] - - - specific_scores = .specific_CCCScores_spots(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expr_residual = expr_residual, - dwls_values = dwls_values, - proximityMat = proximityMat, - random_iter = random_iter, - cell_type_1 = cell_type_1, - cell_type_2 = cell_type_2, - feature_set_1 = feature_set_1, - feature_set_2 = feature_set_2, - min_observations = min_observations, - detailed = detailed, - adjust_method = adjust_method, - adjust_target = adjust_target, - set_seed = set_seed, - seed_number = seed_number) - }) - - } else { - - ## for loop over all combinations ## - savelist = list() - countdown = nrow(combn_DT) - - for(row in 1:nrow(combn_DT)) { - - cell_type_1 = combn_DT[row][['V1']] - cell_type_2 = combn_DT[row][['V2']] - - if(verbose == 'a little' | verbose == 'a lot') cat('\n\n PROCESS nr ', countdown,': ', cell_type_1, ' and ', cell_type_2, '\n\n') - - specific_scores = .specific_CCCScores_spots(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - expr_residual = expr_residual, - dwls_values = dwls_values, - proximityMat = proximityMat, - random_iter = random_iter, - cell_type_1 = cell_type_1, - cell_type_2 = cell_type_2, - feature_set_1 = feature_set_1, - feature_set_2 = feature_set_2, - min_observations = min_observations, - detailed = detailed, - adjust_method = adjust_method, - adjust_target = adjust_target, - set_seed = set_seed, - seed_number = seed_number, - verbose = specific_verbose) - savelist[[row]] = specific_scores - countdown = countdown - 1 + specific_verbose <- TRUE } - } - - finalDT = do.call('rbind', savelist) - - # data.table variables - LR_comb = LR_expr = NULL - - data.table::setorder(finalDT, LR_comb, -LR_expr) + ## check if spatial network exists ## + spat_networks <- names(gobject@spatial_network[[spat_unit]]) - return(finalDT) + if (!spatial_network_name %in% spat_networks) { + stop( + spatial_network_name, " is not an existing spatial network \n", + "use showGiottoSpatNetworks() to see the available networks \n", + "or create a new spatial network with createSpatialNetwork()" + ) + } -} + # expression data + values <- match.arg( + expression_values, choices = c("normalized", "scaled", "custom")) + expr_residual <- .cal_expr_residual( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + ave_celltype_exp = ave_celltype_exp + ) + # compute cell proximity for each spot + proximityMat <- cellProximityEnrichmentEachSpot( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + spatial_network_name = spatial_network_name + ) + # select overlapped spots + intersect_cell_IDs <- intersect( + colnames(expr_residual), colnames(proximityMat)) + expr_residual <- expr_residual[, intersect_cell_IDs] + proximityMat <- proximityMat[, intersect_cell_IDs] + + # exact spatial_enrichment matrix + dwls_values <- get_spatial_enrichment( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table" + ) + data.table::setDF(dwls_values) + rownames_dwls <- dwls_values[, "cell_ID"] + dwls_values <- as.matrix(dwls_values[, -1]) + rownames(dwls_values) <- rownames_dwls + + # check feature list + LR_comb <- data.table::data.table( + ligand = feature_set_1, receptor = feature_set_2) + # check LR pair not captured in giotto object + LR_out <- LR_comb[!LR_comb$ligand %in% rownames( + expr_residual) | !LR_comb$receptor %in% rownames(expr_residual)] + + if (dim(LR_out)[1] > 0) { + message("Ligand or receptor were removed after computing expresion + residual.") + print(LR_out) + LR_comb <- LR_comb[LR_comb$ligand %in% rownames(expr_residual) & + LR_comb$receptor %in% rownames(expr_residual)] + feature_set_1 <- LR_comb$ligand + feature_set_2 <- LR_comb$receptor + } + ## get all combinations between cell types + combn_DT <- data.table::data.table(LR_cell_comb = rownames(proximityMat)) + combn_DT[, V1 := strsplit( + LR_cell_comb, "--")[[1]][1], by = 1:nrow(combn_DT)] + combn_DT[, V2 := strsplit( + LR_cell_comb, "--")[[1]][2], by = 1:nrow(combn_DT)] + + ## parallel option ## + if (do_parallel == TRUE) { + savelist <- lapply_flex( + X = 1:nrow(combn_DT), cores = cores, fun = function(row) { + cell_type_1 <- combn_DT[row][["V1"]] + cell_type_2 <- combn_DT[row][["V2"]] + + + specific_scores <- .specific_CCCScores_spots( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expr_residual = expr_residual, + dwls_values = dwls_values, + proximityMat = proximityMat, + random_iter = random_iter, + cell_type_1 = cell_type_1, + cell_type_2 = cell_type_2, + feature_set_1 = feature_set_1, + feature_set_2 = feature_set_2, + min_observations = min_observations, + detailed = detailed, + adjust_method = adjust_method, + adjust_target = adjust_target, + set_seed = set_seed, + seed_number = seed_number + ) + }) + } else { + ## for loop over all combinations ## + savelist <- list() + countdown <- nrow(combn_DT) + + for (row in 1:nrow(combn_DT)) { + cell_type_1 <- combn_DT[row][["V1"]] + cell_type_2 <- combn_DT[row][["V2"]] + + if (verbose == "a little" | verbose == "a lot") + cat("PROCESS nr ", countdown, ": ", + cell_type_1, " and ", cell_type_2) + + specific_scores <- .specific_CCCScores_spots( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + expr_residual = expr_residual, + dwls_values = dwls_values, + proximityMat = proximityMat, + random_iter = random_iter, + cell_type_1 = cell_type_1, + cell_type_2 = cell_type_2, + feature_set_1 = feature_set_1, + feature_set_2 = feature_set_2, + min_observations = min_observations, + detailed = detailed, + adjust_method = adjust_method, + adjust_target = adjust_target, + set_seed = set_seed, + seed_number = seed_number, + verbose = specific_verbose + ) + savelist[[row]] <- specific_scores + countdown <- countdown - 1 + } + } + finalDT <- do.call("rbind", savelist) + # data.table variables + LR_comb <- LR_expr <- NULL + data.table::setorder(finalDT, LR_comb, -LR_expr) + return(finalDT) +} diff --git a/R/spatial_interaction_visuals.R b/R/spatial_interaction_visuals.R index aeffa0cec..60ae823ea 100644 --- a/R/spatial_interaction_visuals.R +++ b/R/spatial_interaction_visuals.R @@ -1,5 +1,3 @@ - - #' @title cellProximityBarplot #' @name cellProximityBarplot #' @description Create barplot from cell-cell proximity scores @@ -13,55 +11,65 @@ #' @details This function creates a barplot that shows the spatial proximity #' enrichment or depletion of cell type pairs. #' @export -cellProximityBarplot = function(gobject, - CPscore, - min_orig_ints = 5, - min_sim_ints = 5, - p_val = 0.05, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'cellProximityBarplot') { - - - table_mean_results_dc = CPscore$enrichm_res - - ## filter to remove low number of cell-cell proximity interactions ## - - # data.table variables - original = simulations = p_higher_orig = p_lower_orig = enrichm = type_int = unified_int = NULL - - table_mean_results_dc_filter = table_mean_results_dc[original >= min_orig_ints & simulations >= min_sim_ints,] - table_mean_results_dc_filter = table_mean_results_dc_filter[p_higher_orig <= p_val | p_lower_orig <= p_val,] - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::geom_bar(data = table_mean_results_dc_filter, ggplot2::aes(x = unified_int, y = enrichm, fill = type_int), stat = 'identity', show.legend = F) - pl <- pl + ggplot2::coord_flip() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::labs(y = 'enrichment/depletion') - pl - - bpl <- ggplot2::ggplot() - bpl <- bpl + ggplot2::geom_bar(data = table_mean_results_dc_filter, ggplot2::aes(x = unified_int, y = original, fill = type_int), stat = 'identity', show.legend = T) - bpl <- bpl + ggplot2::coord_flip() - bpl <- bpl + ggplot2::theme_bw() + ggplot2::theme(axis.text.y = element_blank()) - bpl <- bpl + ggplot2::labs(y = '# of interactions') - bpl - - combo_plot <- cowplot::plot_grid(pl, bpl, ncol = 2, rel_heights = c(1), rel_widths = c(3,1.5), align = 'h') - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = combo_plot, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +cellProximityBarplot <- function(gobject, + CPscore, + min_orig_ints = 5, + min_sim_ints = 5, + p_val = 0.05, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "cellProximityBarplot") { + table_mean_results_dc <- CPscore$enrichm_res + + ## filter to remove low number of cell-cell proximity interactions ## + + # data.table variables + original <- simulations <- p_higher_orig <- p_lower_orig <- enrichm <- + type_int <- unified_int <- NULL + + table_mean_results_dc_filter <- table_mean_results_dc[ + original >= min_orig_ints & simulations >= min_sim_ints, ] + table_mean_results_dc_filter <- table_mean_results_dc_filter[ + p_higher_orig <= p_val | p_lower_orig <= p_val, ] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_bar( + data = table_mean_results_dc_filter, + ggplot2::aes(x = unified_int, y = enrichm, fill = type_int), + stat = "identity", show.legend = FALSE) + pl <- pl + ggplot2::coord_flip() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::labs(y = "enrichment/depletion") + pl + + bpl <- ggplot2::ggplot() + bpl <- bpl + ggplot2::geom_bar( + data = table_mean_results_dc_filter, + ggplot2::aes(x = unified_int, y = original, fill = type_int), + stat = "identity", show.legend = TRUE) + bpl <- bpl + ggplot2::coord_flip() + bpl <- bpl + ggplot2::theme_bw() + ggplot2::theme( + axis.text.y = element_blank()) + bpl <- bpl + ggplot2::labs(y = "# of interactions") + bpl + + combo_plot <- cowplot::plot_grid( + pl, bpl, ncol = 2, rel_heights = c(1), + rel_widths = c(3, 1.5), align = "h") + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = combo_plot, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title cellProximityHeatmap @@ -72,99 +80,109 @@ cellProximityBarplot = function(gobject, #' @param CPscore CPscore, output from cellProximityEnrichment() #' @param scale scale cell-cell proximity interaction scores #' @param order_cell_types order cell types based on enrichment correlation -#' @param color_breaks numerical vector of length 3 to represent min, mean and maximum +#' @param color_breaks numerical vector of length 3 to represent min, mean +#' and maximum #' @param color_names character color vector of length 3 #' @return ggplot heatmap #' @details This function creates a heatmap that shows the spatial proximity #' enrichment or depletion of cell type pairs. #' @export -cellProximityHeatmap = function(gobject, - CPscore, - scale = T, - order_cell_types = T, - color_breaks = NULL, - color_names = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'cellProximityHeatmap') { - - - enrich_res = CPscore$enrichm_res - - # data.table variables - first_type = second_type = unified_int = NULL - - enrich_res[, first_type := strsplit(x = as.character(unified_int), split = '--')[[1]][1], by = 1:nrow(enrich_res)] - enrich_res[, second_type := strsplit(x = as.character(unified_int), split = '--')[[1]][2], by = 1:nrow(enrich_res)] - - # create matrix - enrich_mat = data.table::dcast.data.table(data = enrich_res,formula = first_type~second_type, value.var = 'enrichm') - matrix_d <- as.matrix(enrich_mat[,-1]); rownames(matrix_d) = as.vector(enrich_mat[[1]]) - t_matrix_d <- t_flex(matrix_d) - - # fill in NAs based on values in upper and lower matrix triangle - t_matrix_d[upper.tri(t_matrix_d)][is.na(t_matrix_d[upper.tri(t_matrix_d)])] = matrix_d[upper.tri(matrix_d)][is.na(t_matrix_d[upper.tri(t_matrix_d)])] - t_matrix_d[lower.tri(t_matrix_d)][is.na(t_matrix_d[lower.tri(t_matrix_d)])] = matrix_d[lower.tri(matrix_d)][is.na(t_matrix_d[lower.tri(t_matrix_d)])] - t_matrix_d[is.na(t_matrix_d)] = 0 - final_matrix = t_matrix_d - - # scale data - if(scale == TRUE) { - final_matrix <- t_flex(scale(t_flex(final_matrix))) - final_matrix <- t_flex(final_matrix) - final_matrix[lower.tri(final_matrix)] <- t_flex(final_matrix)[lower.tri(final_matrix)] - } - - # # if NA values, impute as mean - #if(any(is.na(final_matrix)) == TRUE) { - # myrowmeans = apply(X = final_matrix, MARGIN = 1, FUN = function(x) mean(na.omit(x))) - # mymatrixmeans = matrix(data = rep(myrowmeans, ncol(final_matrix)), nrow = nrow(final_matrix), ncol = ncol(final_matrix)) - # final_matrix[is.na(final_matrix)] = mymatrixmeans[which(is.na(final_matrix))] - #} - - # order cell types - if(order_cell_types == TRUE) { - - cordist = stats::as.dist(1-cor_flex(final_matrix)) - clus = stats::hclust(cordist) - myorder = clus$order - mylabels = clus$labels - names(mylabels) = seq_along(mylabels) - sample_order = mylabels[myorder] - - final_matrix = final_matrix[sample_order, sample_order] - } - - # create custom colors or not - if(!is.null(color_breaks) & !is.null(color_names)) { - - if(length(color_breaks) != 3 | !is.numeric(color_breaks)) { - stop('\n color_breaks needs to be a numerical vector of length 3 \n') - } - - if(length(color_names) != 3 | !is.character(color_names)) { - stop('\n color_names needs to be a character vector of length 3 \n') - } - - heatm = ComplexHeatmap::Heatmap(matrix = final_matrix, cluster_rows = F, cluster_columns = F, - col = GiottoVisuals::colorRamp2(breaks = color_breaks, colors = color_names)) - } else { - heatm = ComplexHeatmap::Heatmap(matrix = final_matrix, cluster_rows = F, cluster_columns = F) - } - - return(plot_output_handler( - gobject = gobject, - plot_object = heatm, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +cellProximityHeatmap <- function(gobject, + CPscore, + scale = TRUE, + order_cell_types = TRUE, + color_breaks = NULL, + color_names = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "cellProximityHeatmap") { + enrich_res <- CPscore$enrichm_res + + # data.table variables + first_type <- second_type <- unified_int <- NULL + + enrich_res[, first_type := strsplit( + x = as.character(unified_int), split = "--")[[1]][1], + by = 1:nrow(enrich_res)] + enrich_res[, second_type := strsplit( + x = as.character(unified_int), split = "--")[[1]][2], + by = 1:nrow(enrich_res)] + + # create matrix + enrich_mat <- data.table::dcast.data.table( + data = enrich_res, + formula = first_type ~ second_type, + value.var = "enrichm") + matrix_d <- as.matrix(enrich_mat[, -1]) + rownames(matrix_d) <- as.vector(enrich_mat[[1]]) + t_matrix_d <- t_flex(matrix_d) + + # fill in NAs based on values in upper and lower matrix triangle + t_matrix_d[upper.tri(t_matrix_d)][is.na(t_matrix_d[ + upper.tri(t_matrix_d)])] <- matrix_d[upper.tri(matrix_d)][ + is.na(t_matrix_d[upper.tri(t_matrix_d)])] + t_matrix_d[lower.tri(t_matrix_d)][is.na(t_matrix_d[ + lower.tri(t_matrix_d)])] <- matrix_d[lower.tri(matrix_d)][ + is.na(t_matrix_d[lower.tri(t_matrix_d)])] + t_matrix_d[is.na(t_matrix_d)] <- 0 + final_matrix <- t_matrix_d + + # scale data + if (scale == TRUE) { + final_matrix <- t_flex(scale(t_flex(final_matrix))) + final_matrix <- t_flex(final_matrix) + final_matrix[lower.tri(final_matrix)] <- t_flex(final_matrix)[ + lower.tri(final_matrix)] + } + + # order cell types + if (order_cell_types == TRUE) { + cordist <- stats::as.dist(1 - cor_flex(final_matrix)) + clus <- stats::hclust(cordist) + myorder <- clus$order + mylabels <- clus$labels + names(mylabels) <- seq_along(mylabels) + sample_order <- mylabels[myorder] + + final_matrix <- final_matrix[sample_order, sample_order] + } + + # create custom colors or not + if (!is.null(color_breaks) & !is.null(color_names)) { + if (length(color_breaks) != 3 | !is.numeric(color_breaks)) { + stop("color_breaks needs to be a numerical vector of length 3") + } + + if (length(color_names) != 3 | !is.character(color_names)) { + stop("color_names needs to be a character vector of length 3") + } + + heatm <- ComplexHeatmap::Heatmap( + matrix = final_matrix, + cluster_rows = FALSE, + cluster_columns = FALSE, + col = GiottoVisuals::colorRamp2( + breaks = color_breaks, colors = color_names) + ) + } else { + heatm <- ComplexHeatmap::Heatmap( + matrix = final_matrix, + cluster_rows = FALSE, + cluster_columns = FALSE) + } + return(plot_output_handler( + gobject = gobject, + plot_object = heatm, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -179,8 +197,10 @@ cellProximityHeatmap = function(gobject, #' @param color_depletion color for depleted cell-cell interactions #' @param color_enrichment color for enriched cell-cell interactions #' @param rescale_edge_weights rescale edge weights (boolean) -#' @param edge_weight_range_depletion numerical vector of length 2 to rescale depleted edge weights -#' @param edge_weight_range_enrichment numerical vector of length 2 to rescale enriched edge weights +#' @param edge_weight_range_depletion numerical vector of length 2 to rescale +#' depleted edge weights +#' @param edge_weight_range_enrichment numerical vector of length 2 to rescale +#' enriched edge weights #' @param layout layout algorithm to use to draw nodes and edges #' @param only_show_enrichment_edges show only the enriched pairwise scores #' @param edge_width_range range of edge width @@ -191,141 +211,153 @@ cellProximityHeatmap = function(gobject, #' @details This function creates a network that shows the spatial proximity #' enrichment or depletion of cell type pairs. #' @export -cellProximityNetwork = function(gobject, - CPscore, - remove_self_edges = FALSE, - self_loop_strength = 0.1, - color_depletion = 'lightgreen', - color_enrichment = 'red', - rescale_edge_weights = TRUE, - edge_weight_range_depletion = c(0.1, 1), - edge_weight_range_enrichment = c(1, 5), - layout = c('Fruchterman', 'DrL', 'Kamada-Kawai'), - only_show_enrichment_edges = F, - edge_width_range = c(0.1, 2), - node_size = 4, - node_color_code = NULL, - node_text_size = 6, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'cellProximityNetwork') { - - # extract scores - - # data.table variables - cell_1 = cell_2 = unified_int = color = size = name = NULL - - CPscores = CPscore[['enrichm_res']] - CPscores[, cell_1 := strsplit(as.character(unified_int), split = '--')[[1]][1], by = 1:nrow(CPscores)] - CPscores[, cell_2 := strsplit(as.character(unified_int), split = '--')[[1]][2], by = 1:nrow(CPscores)] - - # create igraph with enrichm as weight edges - igd = igraph::graph_from_data_frame(d = CPscores[,c('cell_1', 'cell_2', 'enrichm')], directed = F) - - if(remove_self_edges == TRUE) { - igd = igraph::simplify(graph = igd, remove.loops = TRUE, remove.multiple = FALSE) - } - - edges_sizes = igraph::get.edge.attribute(igd, 'enrichm') - post_edges_sizes = edges_sizes[edges_sizes > 0] - neg_edges_sizes = edges_sizes[edges_sizes <= 0] - - # rescale if wanted - if(rescale_edge_weights == TRUE) { - pos_edges_sizes_resc = scales::rescale(x = post_edges_sizes, to = edge_weight_range_enrichment) - neg_edges_sizes_resc = scales::rescale(x = neg_edges_sizes, to = edge_weight_range_depletion) - edges_sizes_resc = c(pos_edges_sizes_resc, neg_edges_sizes_resc) - } else { - edges_sizes_resc = c(post_edges_sizes, neg_edges_sizes) - } - - # colors - edges_colors = ifelse(edges_sizes > 0, 'enriched', 'depleted') - - - # create coordinates for layout - if(inherits(layout, c('data.frame', 'data.table'))) { - if(ncol(layout) < 2) { - stop('custom layout needs to have at least 2 columns') - } - - if(nrow(layout) != length(igraph::E(igd))) { - stop('rows of custom layout need to be the same as number of edges') - } - - } else { - layout = match.arg(arg = layout, choices = c('Fruchterman', 'DrL', 'Kamada-Kawai')) - } - - - - - #iplot = igraph::plot.igraph(igd, edge.color = edges_colors, edge.width = edges_sizes_resc, layout = coords) - - igd = igraph::set.edge.attribute(graph = igd, index = igraph::E(igd), name = 'color', value = edges_colors) - igd = igraph::set.edge.attribute(graph = igd, index = igraph::E(igd), name = 'size', value = as.numeric(edges_sizes_resc)) - - ## only show attractive edges - if(only_show_enrichment_edges == TRUE) { - colors = igraph::get.edge.attribute(igd, name = 'color') - subvertices_ids = which(colors == 'enriched') - igd = igraph::subgraph.edges(graph = igd, eids = subvertices_ids) - - # get new rescale vector (in case vector id is lost) - edges_sizes_resc = igraph::E(igd)$size - } - - ## get coordinates layouts - if(layout == 'Fruchterman') { - coords = igraph::layout_with_fr(graph = igd, weights = edges_sizes_resc) - } else if(layout == 'DrL') { - coords = igraph::layout_with_drl(graph = igd, weights = edges_sizes_resc) - } else if(layout == 'Kamada-Kawai') { - coords = igraph::layout_with_kk(graph = igd, weights = edges_sizes_resc) - } else { - stop('\n Currently no other layouts have been implemented \n') - } - - - #longDT = as.data.table(igraph::as_long_data_frame(igd)) - #return(longDT) - #return(list(igd, coords)) - - ## create plot - gpl = ggraph::ggraph(graph = igd, layout = coords) - gpl = gpl + ggraph::geom_edge_link(ggplot2::aes(color = factor(color), edge_width = size, edge_alpha = size), show.legend = F) - - if(remove_self_edges == FALSE) { - gpl = gpl + ggraph::geom_edge_loop(ggplot2::aes(color = factor(color), edge_width = size, edge_alpha = size, strength = self_loop_strength), show.legend = F) - } - - gpl = gpl + ggraph::scale_edge_color_manual(values = c('enriched' = color_enrichment, 'depleted' = color_depletion)) - gpl = gpl + ggraph::scale_edge_width(range = edge_width_range) - gpl = gpl + ggraph::scale_edge_alpha(range = c(0.1,1)) - gpl = gpl + ggraph::geom_node_text(ggplot2::aes(label = name), repel = TRUE, size = node_text_size) - gpl = gpl + ggraph::geom_node_point(ggplot2::aes(color = name), size = node_size) - if(!is.null(node_color_code)) { - gpl = gpl + ggplot2::scale_color_manual(values = node_color_code) - } - gpl = gpl + ggplot2::theme_bw() + ggplot2::theme(panel.grid = ggplot2::element_blank(), - panel.border = ggplot2::element_blank(), - axis.title = ggplot2::element_blank(), - axis.text = ggplot2::element_blank(), - axis.ticks = ggplot2::element_blank()) - - return(plot_output_handler( - gobject = gobject, - plot_object = gpl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +cellProximityNetwork <- function(gobject, + CPscore, + remove_self_edges = FALSE, + self_loop_strength = 0.1, + color_depletion = "lightgreen", + color_enrichment = "red", + rescale_edge_weights = TRUE, + edge_weight_range_depletion = c(0.1, 1), + edge_weight_range_enrichment = c(1, 5), + layout = c("Fruchterman", "DrL", "Kamada-Kawai"), + only_show_enrichment_edges = FALSE, + edge_width_range = c(0.1, 2), + node_size = 4, + node_color_code = NULL, + node_text_size = 6, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "cellProximityNetwork") { + # extract scores + + # data.table variables + cell_1 <- cell_2 <- unified_int <- color <- size <- name <- NULL + + CPscores <- CPscore[["enrichm_res"]] + CPscores[, cell_1 := strsplit( + as.character(unified_int), split = "--")[[1]][1], by = 1:nrow(CPscores)] + CPscores[, cell_2 := strsplit( + as.character(unified_int), split = "--")[[1]][2], by = 1:nrow(CPscores)] + + # create igraph with enrichm as weight edges + igd <- igraph::graph_from_data_frame( + d = CPscores[, c("cell_1", "cell_2", "enrichm")], directed = FALSE) + + if (remove_self_edges == TRUE) { + igd <- igraph::simplify( + graph = igd, remove.loops = TRUE, remove.multiple = FALSE) + } + + edges_sizes <- igraph::get.edge.attribute(igd, "enrichm") + post_edges_sizes <- edges_sizes[edges_sizes > 0] + neg_edges_sizes <- edges_sizes[edges_sizes <= 0] + + # rescale if wanted + if (rescale_edge_weights == TRUE) { + pos_edges_sizes_resc <- scales::rescale( + x = post_edges_sizes, to = edge_weight_range_enrichment) + neg_edges_sizes_resc <- scales::rescale( + x = neg_edges_sizes, to = edge_weight_range_depletion) + edges_sizes_resc <- c(pos_edges_sizes_resc, neg_edges_sizes_resc) + } else { + edges_sizes_resc <- c(post_edges_sizes, neg_edges_sizes) + } + + # colors + edges_colors <- ifelse(edges_sizes > 0, "enriched", "depleted") + + + # create coordinates for layout + if (inherits(layout, c("data.frame", "data.table"))) { + if (ncol(layout) < 2) { + stop("custom layout needs to have at least 2 columns") + } + + if (nrow(layout) != length(igraph::E(igd))) { + stop("rows of custom layout need to be the same as number of edges") + } + } else { + layout <- match.arg( + arg = layout, choices = c("Fruchterman", "DrL", "Kamada-Kawai")) + } + igd <- igraph::set.edge.attribute( + graph = igd, index = igraph::E(igd), name = "color", + value = edges_colors) + igd <- igraph::set.edge.attribute( + graph = igd, index = igraph::E(igd), name = "size", + value = as.numeric(edges_sizes_resc)) + + ## only show attractive edges + if (only_show_enrichment_edges == TRUE) { + colors <- igraph::get.edge.attribute(igd, name = "color") + subvertices_ids <- which(colors == "enriched") + igd <- igraph::subgraph.edges(graph = igd, eids = subvertices_ids) + + # get new rescale vector (in case vector id is lost) + edges_sizes_resc <- igraph::E(igd)$size + } + + ## get coordinates layouts + if (layout == "Fruchterman") { + coords <- igraph::layout_with_fr( + graph = igd, weights = edges_sizes_resc) + } else if (layout == "DrL") { + coords <- igraph::layout_with_drl( + graph = igd, weights = edges_sizes_resc) + } else if (layout == "Kamada-Kawai") { + coords <- igraph::layout_with_kk( + graph = igd, weights = edges_sizes_resc) + } else { + stop("Currently no other layouts have been implemented") + } + + ## create plot + gpl <- ggraph::ggraph(graph = igd, layout = coords) + gpl <- gpl + ggraph::geom_edge_link( + ggplot2::aes(color = factor(color), + edge_width = size, edge_alpha = size), + show.legend = FALSE) + + if (remove_self_edges == FALSE) { + gpl <- gpl + ggraph::geom_edge_loop( + ggplot2::aes(color = factor(color), edge_width = size, + edge_alpha = size, strength = self_loop_strength), + show.legend = FALSE) + } + + gpl <- gpl + ggraph::scale_edge_color_manual( + values = c("enriched" = color_enrichment, "depleted" = color_depletion)) + gpl <- gpl + ggraph::scale_edge_width(range = edge_width_range) + gpl <- gpl + ggraph::scale_edge_alpha(range = c(0.1, 1)) + gpl <- gpl + ggraph::geom_node_text( + ggplot2::aes(label = name), repel = TRUE, size = node_text_size) + gpl <- gpl + ggraph::geom_node_point( + ggplot2::aes(color = name), size = node_size) + if (!is.null(node_color_code)) { + gpl <- gpl + ggplot2::scale_color_manual(values = node_color_code) + } + gpl <- gpl + ggplot2::theme_bw() + ggplot2::theme( + panel.grid = ggplot2::element_blank(), + panel.border = ggplot2::element_blank(), + axis.title = ggplot2::element_blank(), + axis.text = ggplot2::element_blank(), + axis.ticks = ggplot2::element_blank() + ) + + return(plot_output_handler( + gobject = gobject, + plot_object = gpl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -340,558 +372,708 @@ cellProximityNetwork = function(gobject, NULL -#' @describeIn cellProximityVisPlot_internals Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +#' @describeIn cellProximityVisPlot_internals Visualize 2D cell-cell +#' interactions according to spatial coordinates in ggplot mode #' @keywords internal .cellProximityVisPlot_2D_ggplot <- function(gobject, - interaction_name = NULL, - cluster_column = NULL, - sdimx = NULL, - sdimy = NULL, - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = TRUE, - show_other_cells = FALSE, - show_network = FALSE, - show_other_network = FALSE, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = FALSE, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - coord_fix_ratio = 1, - show_legend = TRUE, - point_size_select = 2, - point_select_border_col = 'black', - point_select_border_stroke = 0.05, - point_size_other = 1, - point_alpha_other = 0.3, - point_other_border_col = 'lightgrey', - point_other_border_stroke = 0.01, - ...){ - - # data.table variables - unified_int = sdimx_begin = sdimy_begin = sdimx_end = sdimy_end = x_start = x_end = NULL - y_start = y_end = cell_ID = NULL - - if(is.null(interaction_name)) { - stop('\n you need to specific at least one interaction name, run cellProximityEnrichment \n') - } - - - cell_locations = gobject@spatial_locs - spatial_grid = gobject@spatial_grid[[spatial_grid_name]] - cell_metadata = gobject@cell_metadata - - - - spatial_network = annotateSpatialNetwork(gobject = gobject, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - cell_IDs_to_keep = unique(c(spatial_network[unified_int %in% interaction_name]$to, - spatial_network[unified_int %in% interaction_name]$from)) - - #print(cell_IDs_to_keep) - - if(show_other_cells){ - CellType <- strsplit(interaction_name,"--") - all_cell_IDs = cell_metadata[cell_metadata[[cluster_column]] == CellType[[1]][1] | - cell_metadata[[cluster_column]] == CellType[[1]][2],]$cell_ID - other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) - } - - - # annotated cell data - if(nrow(cell_metadata) == 0) { - cell_locations_metadata = cell_locations - } else { - cell_locations_metadata <- merge(cell_locations, cell_metadata,by = "cell_ID") - } - - - # first 2 dimensions need to be defined - if(is.null(sdimx) | is.null(sdimy)) { - cat('first and second dimenion need to be defined, default is first 2 \n') - sdimx = 'sdimx' - sdimy = 'sdimy' - } - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - - if(!is.null(spatial_network) & show_network == TRUE) { - if(is.null(network_color)) network_color = 'red' - if(show_other_network){ - pl <- pl + ggplot2::geom_segment(data = spatial_network[!unified_int %in% interaction_name], - aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, yend = sdimy_end), - color = 'lightgrey', size = 0.5, alpha = 0.5) - } - pl <- pl + ggplot2::geom_segment(data = spatial_network[unified_int %in% interaction_name], - aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, yend = sdimy_end), - color = network_color, size = 0.5, alpha = 0.5) - } - - if(!is.null(spatial_grid) & show_grid == TRUE) { - if(is.null(grid_color)) grid_color = 'black' - pl <- pl + ggplot2::geom_rect(data = spatial_grid, aes(xmin = x_start, xmax = x_end, ymin = y_start, ymax = y_end), - color = grid_color, fill = NA) - } - - # cell color default - if(is.null(cell_color)) { - cell_color = 'lightblue' - pl <- pl + ggplot2::geom_point(data = cell_locations[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = 'lightgrey', size = point_size_other) - pl <- pl + ggplot2::geom_point(data = cell_locations[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, size = point_size_select) - if(show_other_cells){ - pl <- pl + ggplot2::geom_point(data = cell_locations[cell_ID %in% other_cell_IDs], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, alpha = point_alpha_other, - size = point_size_select * 0.5) - } - } - else if (is.character(cell_color)) { - if(cell_color %in% colnames(cell_locations_metadata)) { - - if(color_as_factor == TRUE) { - factor_data = factor(cell_locations_metadata[[cell_color]]) - cell_locations_metadata[[cell_color]] <- factor_data - } - - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - fill = 'lightgrey', shape = 21, size = point_size_other, - color = point_other_border_col, stroke = point_other_border_stroke) - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy, fill = cell_color), - show.legend = show_legend, shape = 21, size = point_size_select, - color = point_select_border_col, stroke = point_select_border_stroke) - if(show_other_cells){ - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% other_cell_IDs], aes_string(x = sdimx, y = sdimy,fill = cell_color), - show.legend = show_legend, shape = 21, alpha = point_alpha_other, - size = point_size_select * 0.5) - } - - - - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } else if(color_as_factor == T) { - number_colors = length(unique(factor_data)) - cell_color_code = set_default_color_discrete_cell(instrs = instructions(gobject))(n = number_colors) - names(cell_color_code) = unique(factor_data) - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } else if(color_as_factor == F){ - pl <- pl + set_default_color_continuous_cell( - colors = cell_color_code, - instrs = instructions(gobject), - midpoint = NULL, - style = 'sequential', - type_default = list( - pal = c('blue', 'red') - ) - ) - } + interaction_name = NULL, + cluster_column = NULL, + sdimx = NULL, + sdimy = NULL, + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + coord_fix_ratio = 1, + show_legend = TRUE, + point_size_select = 2, + point_select_border_col = "black", + point_select_border_stroke = 0.05, + point_size_other = 1, + point_alpha_other = 0.3, + point_other_border_col = "lightgrey", + point_other_border_stroke = 0.01, + ...) { + # data.table variables + unified_int <- sdimx_begin <- sdimy_begin <- sdimx_end <- sdimy_end <- + x_start <- x_end <- NULL + y_start <- y_end <- cell_ID <- NULL + + if (is.null(interaction_name)) { + stop("you need to specific at least one interaction name, run + cellProximityEnrichment") + } + + + cell_locations <- gobject@spatial_locs + spatial_grid <- gobject@spatial_grid[[spatial_grid_name]] + cell_metadata <- gobject@cell_metadata + + + + spatial_network <- annotateSpatialNetwork( + gobject = gobject, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) + + cell_IDs_to_keep <- unique(c( + spatial_network[unified_int %in% interaction_name]$to, + spatial_network[unified_int %in% interaction_name]$from + )) + + + if (show_other_cells) { + CellType <- strsplit(interaction_name, "--") + all_cell_IDs <- cell_metadata[cell_metadata[[ + cluster_column]] == CellType[[1]][1] | + cell_metadata[[cluster_column]] == CellType[[1]][2], ]$cell_ID + other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) + } + + # annotated cell data + if (nrow(cell_metadata) == 0) { + cell_locations_metadata <- cell_locations } else { - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = 'lightgrey', size = point_size_other, - color = point_other_border_col, stroke = point_other_border_stroke) - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, size = point_size_select, - color = point_select_border_col, stroke = point_select_border_stroke) + cell_locations_metadata <- merge( + cell_locations, cell_metadata, by = "cell_ID") } - } - pl <- pl + ggplot2::theme_bw() + ggplot2::theme(plot.title = element_text(hjust = 0.5), - legend.title = element_text(size = 10), - legend.text = element_text(size = 10)) + # first 2 dimensions need to be defined + if (is.null(sdimx) | is.null(sdimy)) { + message("first and second dimenion need to be defined, default is + first 2\n") + sdimx <- "sdimx" + sdimy <- "sdimy" + } + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + + if (!is.null(spatial_network) & show_network == TRUE) { + if (is.null(network_color)) network_color <- "red" + if (show_other_network) { + pl <- pl + ggplot2::geom_segment( + data = spatial_network[!unified_int %in% interaction_name], + aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, + yend = sdimy_end), + color = "lightgrey", size = 0.5, alpha = 0.5 + ) + } + pl <- pl + ggplot2::geom_segment( + data = spatial_network[unified_int %in% interaction_name], + aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, + yend = sdimy_end), + color = network_color, size = 0.5, alpha = 0.5 + ) + } + + if (!is.null(spatial_grid) & show_grid == TRUE) { + if (is.null(grid_color)) grid_color <- "black" + pl <- pl + ggplot2::geom_rect( + data = spatial_grid, + aes(xmin = x_start, xmax = x_end, ymin = y_start, ymax = y_end), + color = grid_color, fill = NA + ) + } + + # cell color default + if (is.null(cell_color)) { + cell_color <- "lightblue" + pl <- pl + ggplot2::geom_point( + data = cell_locations[!cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = "lightgrey", + size = point_size_other + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + size = point_size_select + ) + if (show_other_cells) { + pl <- pl + ggplot2::geom_point( + data = cell_locations[cell_ID %in% other_cell_IDs], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + alpha = point_alpha_other, + size = point_size_select * 0.5 + ) + } + } else if (is.character(cell_color)) { + if (cell_color %in% colnames(cell_locations_metadata)) { + if (color_as_factor == TRUE) { + factor_data <- factor(cell_locations_metadata[[cell_color]]) + cell_locations_metadata[[cell_color]] <- factor_data + } + + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + fill = "lightgrey", shape = 21, size = point_size_other, + color = point_other_border_col, + stroke = point_other_border_stroke + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy, fill = cell_color), + show.legend = show_legend, shape = 21, size = point_size_select, + color = point_select_border_col, + stroke = point_select_border_stroke + ) + if (show_other_cells) { + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + aes_string(x = sdimx, y = sdimy, fill = cell_color), + show.legend = show_legend, shape = 21, + alpha = point_alpha_other, + size = point_size_select * 0.5 + ) + } + + + + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } else if (color_as_factor == TRUE) { + number_colors <- length(unique(factor_data)) + cell_color_code <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = number_colors) + names(cell_color_code) <- unique(factor_data) + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } else if (color_as_factor == FALSE) { + pl <- pl + set_default_color_continuous_cell( + colors = cell_color_code, + instrs = instructions(gobject), + midpoint = NULL, + style = "sequential", + type_default = list( + pal = c("blue", "red") + ) + ) + } + } else { + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[ + !cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, + fill = "lightgrey", size = point_size_other, + color = point_other_border_col, + stroke = point_other_border_stroke + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + size = point_size_select, + color = point_select_border_col, + stroke = point_select_border_stroke + ) + } + } - # fix coord ratio - if(!is.null(coord_fix_ratio)) { - pl <- pl + ggplot2::coord_fixed(ratio = coord_fix_ratio) - } + pl <- pl + ggplot2::theme_bw() + ggplot2::theme( + plot.title = element_text(hjust = 0.5), + legend.title = element_text(size = 10), + legend.text = element_text(size = 10) + ) + + # fix coord ratio + if (!is.null(coord_fix_ratio)) { + pl <- pl + ggplot2::coord_fixed(ratio = coord_fix_ratio) + } - pl <- pl + ggplot2::labs(x = 'x coordinates', y = 'y coordinates') + pl <- pl + ggplot2::labs(x = "x coordinates", y = "y coordinates") - return(pl) + return(pl) } -#' @describeIn cellProximityVisPlot_internals Visualize 2D cell-cell interactions according to spatial coordinates in plotly mode +#' @describeIn cellProximityVisPlot_internals Visualize 2D cell-cell +#' interactions according to spatial coordinates in plotly mode #' @keywords internal .cellProximityVisPlot_2D_plotly <- function(gobject, - interaction_name = NULL, - cluster_column = NULL, - sdimx = NULL, - sdimy = NULL, - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = F, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - show_legend = T, - point_size_select = 2, - point_size_other = 1, - point_alpha_other = 0.3, - axis_scale = c("cube","real","custom"), - custom_ratio = NULL, - x_ticks = NULL, - y_ticks = NULL, - ...){ - - - # data.table variables - cell_ID = unified_int = NULL - - if(is.null(interaction_name)) { - stop('\n you need to specific at least one interaction name, run cellProximityEnrichment \n') - } - - - cell_locations = gobject@spatial_locs - spatial_grid = gobject@spatial_grid[[spatial_grid_name]] - cell_metadata = gobject@cell_metadata - - - spatial_network = annotateSpatialNetwork(gobject = gobject, spatial_network_name = spatial_network_name, cluster_column = cluster_column) - - cell_IDs_to_keep = unique(c(spatial_network[unified_int %in% interaction_name]$to, spatial_network[unified_int %in% interaction_name]$from)) - - if(show_other_cells){ - CellType <- strsplit(interaction_name,"-") - all_cell_IDs = cell_metadata[cell_metadata[[cluster_column]] == CellType[[1]][1] | - cell_metadata[[cluster_column]] == CellType[[1]][2],]$cell_ID - other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) - } - - # annotated cell data - if(nrow(cell_metadata) == 0) { - cell_locations_metadata = cell_locations - } else { - cell_locations_metadata <- merge(cell_locations, cell_metadata, by = "cell_ID") - } - - - - - # first 2 dimensions need to be defined - if(is.null(sdimx) | is.null(sdimy)) { - cat('first and second dimenion need to be defined, default is first 2 \n') - sdimx = 'sdimx' - sdimy = 'sdimy' - } - - - #create 2D plotly plot - - axis_scale = match.arg(axis_scale, c("cube","real","custom")) - - ratio = plotly_axis_scale_2D(cell_locations_metadata,sdimx = sdimx,sdimy = sdimy, - mode = axis_scale,custom_ratio = custom_ratio) - - pl <- plotly::plot_ly() - - if(show_network == TRUE) { - if(is.null(network_color)){ - network_color = "red" - } - if(show_other_network){ - pl <- pl %>% plotly::add_segments(name = paste("unselected",spatial_network_name,sep = " "), - type = "scatter", - data = spatial_network[!unified_int %in% interaction_name], - x = ~sdimx_begin, - y =~sdimy_begin, - xend = ~sdimx_end, - yend = ~sdimy_end, - line = list(color = "lightgrey", - width = 0.5), - opacity=0.3) - } - pl <- pl %>% plotly::add_segments(name = spatial_network_name, - type = "scatter", - data = spatial_network[unified_int %in% interaction_name], - x = ~sdimx_begin, - y = ~sdimy_begin, - xend = ~sdimx_end, - yend = ~sdimy_end, - line = list(color = network_color, - width = 0.5), - opacity=0.8) - - } - if(show_grid == TRUE){ - if(is.null(grid_color)) { - grid_color = 'black' - } - edges <- plotly_grid(spatial_grid) - spl <- spl %>% plotly::add_segments(name = "spatial_grid", - type = "scatter", - data = edges, - x = ~x, - y = ~y, - xend = ~x_end, - yend = ~y_end, - line = list(color = grid_color, - width = 1), - opacity=1) - - } - - - if(!is.null(cell_color)) { - if(cell_color %in% colnames(cell_locations_metadata)){ - if(is.null(cell_color_code)) { - number_colors=length(unique(cell_locations_metadata[[cell_color]])) - cell_color_code = set_default_color_discrete_cell(instrs = instructions(gobject))(n = number_colors) - } - cell_locations_metadata[[cell_color]] <- as.factor(cell_locations_metadata[[cell_color]]) - - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = 'markers', - #name = "selected cells", - data=cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, - color = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep][[cell_color]], - colors = cell_color_code, - marker = list(size = point_size_select)) - if(show_other_cells){ - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = 'markers', - #name = "selected cells outside network", - data=cell_locations_metadata[cell_ID %in% other_cell_IDs], - x = ~sdimx, y = ~sdimy, - color = cell_locations_metadata[cell_ID %in% other_cell_IDs][[cell_color]], - colors = cell_color_code, - opacity = point_alpha_other, - marker = list(size = point_size_select * 0.7)) - } - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = "markers", - name = "unselected cells", - data=cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, - marker = list(size = point_size_other,color = "lightgray",colors = "lightgray"), - opacity = point_alpha_other) - - } - else{ - cat('cell_color not exist!\n') - } - } - else { - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = 'markers', - name = "selected cells", - data=cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, - marker = list(size = point_size_select,color = "lightblue",colors = "lightblue")) - if(show_other_cells){ - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = 'markers', - data=cell_locations_metadata[cell_ID %in% other_cell_IDs], - x = ~sdimx, y = ~sdimy, - name = "selected cells outside network", - marker = list(size = point_size_select*0.7,color = "lightblue",colors = "lightblue"), - opacity = point_alpha_other) - } - pl <- pl %>% plotly::add_trace(type = 'scatter',mode = "markers", - name = "unselected cells", - data=cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, - marker = list(size = point_size_other,color = "lightgray",colors = "lightgray"), - opacity = point_alpha_other) - - } - - pl <- pl %>% - plotly::layout(list(xaxis = list(title = 'X',nticks = x_ticks), - yaxis = list(title = 'Y',nticks = y_ticks)), - legend = list(x = 100, y = 0.5, - font = list(family = "sans-serif",size = 12))) - return((pl)) + interaction_name = NULL, + cluster_column = NULL, + sdimx = NULL, + sdimy = NULL, + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + show_legend = TRUE, + point_size_select = 2, + point_size_other = 1, + point_alpha_other = 0.3, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + x_ticks = NULL, + y_ticks = NULL, + ...) { + # data.table variables + cell_ID <- unified_int <- NULL + + if (is.null(interaction_name)) { + stop("you need to specific at least one interaction name, run + cellProximityEnrichment") + } + + + cell_locations <- gobject@spatial_locs + spatial_grid <- gobject@spatial_grid[[spatial_grid_name]] + cell_metadata <- gobject@cell_metadata + + + spatial_network <- annotateSpatialNetwork( + gobject = gobject, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column) + + cell_IDs_to_keep <- unique(c( + spatial_network[unified_int %in% interaction_name]$to, + spatial_network[unified_int %in% interaction_name]$from)) + + if (show_other_cells) { + CellType <- strsplit(interaction_name, "-") + all_cell_IDs <- cell_metadata[cell_metadata[[ + cluster_column]] == CellType[[1]][1] | + cell_metadata[[cluster_column]] == CellType[[1]][2], ]$cell_ID + other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) + } + + # annotated cell data + if (nrow(cell_metadata) == 0) { + cell_locations_metadata <- cell_locations + } else { + cell_locations_metadata <- merge( + cell_locations, cell_metadata, by = "cell_ID") + } + + + + + # first 2 dimensions need to be defined + if (is.null(sdimx) | is.null(sdimy)) { + message("first and second dimenion need to be defined, default is + first 2") + sdimx <- "sdimx" + sdimy <- "sdimy" + } + + + # create 2D plotly plot + + axis_scale <- match.arg(axis_scale, c("cube", "real", "custom")) + + ratio <- plotly_axis_scale_2D(cell_locations_metadata, + sdimx = sdimx, sdimy = sdimy, + mode = axis_scale, custom_ratio = custom_ratio + ) + + pl <- plotly::plot_ly() + + if (show_network == TRUE) { + if (is.null(network_color)) { + network_color <- "red" + } + if (show_other_network) { + pl <- pl %>% plotly::add_segments( + name = paste("unselected", spatial_network_name, sep = " "), + type = "scatter", + data = spatial_network[!unified_int %in% interaction_name], + x = ~sdimx_begin, + y = ~sdimy_begin, + xend = ~sdimx_end, + yend = ~sdimy_end, + line = list( + color = "lightgrey", + width = 0.5 + ), + opacity = 0.3 + ) + } + pl <- pl %>% plotly::add_segments( + name = spatial_network_name, + type = "scatter", + data = spatial_network[unified_int %in% interaction_name], + x = ~sdimx_begin, + y = ~sdimy_begin, + xend = ~sdimx_end, + yend = ~sdimy_end, + line = list( + color = network_color, + width = 0.5 + ), + opacity = 0.8 + ) + } + if (show_grid == TRUE) { + if (is.null(grid_color)) { + grid_color <- "black" + } + edges <- plotly_grid(spatial_grid) + spl <- spl %>% plotly::add_segments( + name = "spatial_grid", + type = "scatter", + data = edges, + x = ~x, + y = ~y, + xend = ~x_end, + yend = ~y_end, + line = list( + color = grid_color, + width = 1 + ), + opacity = 1 + ) + } + + + if (!is.null(cell_color)) { + if (cell_color %in% colnames(cell_locations_metadata)) { + if (is.null(cell_color_code)) { + number_colors <- length(unique(cell_locations_metadata[[ + cell_color]])) + cell_color_code <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = number_colors) + } + cell_locations_metadata[[cell_color]] <- as.factor( + cell_locations_metadata[[cell_color]]) + + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + # name = "selected cells", + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, + color = cell_locations_metadata[ + cell_ID %in% cell_IDs_to_keep][[cell_color]], + colors = cell_color_code, + marker = list(size = point_size_select) + ) + if (show_other_cells) { + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + # name = "selected cells outside network", + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + x = ~sdimx, y = ~sdimy, + color = cell_locations_metadata[ + cell_ID %in% other_cell_IDs][[cell_color]], + colors = cell_color_code, + opacity = point_alpha_other, + marker = list(size = point_size_select * 0.7) + ) + } + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + name = "unselected cells", + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, + marker = list( + size = point_size_other, + color = "lightgray", + colors = "lightgray"), + opacity = point_alpha_other + ) + } else { + message("cell_color not exist!") + } + } else { + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + name = "selected cells", + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, + marker = list( + size = point_size_select, + color = "lightblue", + colors = "lightblue") + ) + if (show_other_cells) { + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + x = ~sdimx, y = ~sdimy, + name = "selected cells outside network", + marker = list( + size = point_size_select * 0.7, + color = "lightblue", + colors = "lightblue"), + opacity = point_alpha_other + ) + } + pl <- pl %>% plotly::add_trace( + type = "scatter", mode = "markers", + name = "unselected cells", + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, + marker = list( + size = point_size_other, + color = "lightgray", + colors = "lightgray"), + opacity = point_alpha_other + ) + } + pl <- pl %>% + plotly::layout( + list( + xaxis = list(title = "X", nticks = x_ticks), + yaxis = list(title = "Y", nticks = y_ticks) + ), + legend = list( + x = 100, y = 0.5, + font = list(family = "sans-serif", size = 12) + ) + ) + return((pl)) } -#' @describeIn cellProximityVisPlot_internals Visualize 3D cell-cell interactions according to spatial coordinates in plotly mode +#' @describeIn cellProximityVisPlot_internals Visualize 3D cell-cell +#' interactions according to spatial coordinates in plotly mode #' @keywords internal .cellProximityVisPlot_3D_plotly <- function(gobject, - interaction_name = NULL, - cluster_column = NULL, - sdimx = NULL, - sdimy = NULL, - sdimz = NULL, - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = F, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - show_legend = T, - point_size_select = 2, - point_size_other = 1, - point_alpha_other = 0.5, - axis_scale = c("cube","real","custom"), - custom_ratio = NULL, - x_ticks = NULL, - y_ticks = NULL, - z_ticks = NULL, - ...){ - - # data.table variables - cell_ID = unified_int = NULL - - if(is.null(interaction_name)) { - stop('\n you need to specific at least one interaction name, run cellProximityEnrichment \n') - } - - - cell_locations = gobject@spatial_locs - spatial_grid = gobject@spatial_grid[[spatial_grid_name]] - cell_metadata = gobject@cell_metadata - - - spatial_network = annotateSpatialNetwork(gobject = gobject, spatial_network_name = spatial_network_name, cluster_column = cluster_column) - - cell_IDs_to_keep = unique(c(spatial_network[unified_int %in% interaction_name]$to, spatial_network[unified_int %in% interaction_name]$from)) - - if(show_other_cells){ - CellType <- strsplit(interaction_name,"-") - all_cell_IDs = cell_metadata[cell_metadata[[cluster_column]] == CellType[[1]][1] | - cell_metadata[[cluster_column]] == CellType[[1]][2],]$cell_ID - other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) - } - - # annotated cell data - if(nrow(cell_metadata) == 0) { - cell_locations_metadata = cell_locations - } else { - cell_locations_metadata <- merge(cell_locations, cell_metadata, by = "cell_ID") - } - - - - - # first 2 dimensions need to be defined - if(is.null(sdimx) | is.null(sdimy)) { - cat('first and second dimenion need to be defined, default is first 2 \n') - sdimx = 'sdimx' - sdimy = 'sdimy' - } - - - - - - # if 3 dimensions are defined create a 3D plot - - # create 3D plotly plot - - pl <- plotly::plot_ly() - - axis_scale = match.arg(axis_scale, c("cube","real","custom")) - - ratio = plotly_axis_scale_3D(cell_locations_metadata,sdimx = sdimx,sdimy = sdimy,sdimz = sdimz, - mode = axis_scale,custom_ratio = custom_ratio) - - if(!is.null(cell_color)) { - if(cell_color %in% colnames(cell_locations_metadata)){ - if(is.null(cell_color_code)) { - number_colors=length(unique(cell_locations_metadata[[cell_color]])) - cell_color_code = set_default_color_discrete_cell(instrs = instructions(gobject))(n = number_colors) - } - cell_locations_metadata[[cell_color]] <- as.factor(cell_locations_metadata[[cell_color]]) - - pl <- pl %>% plotly::add_trace(type = 'scatter3d',mode = 'markers', - #name = "selected cells", - data=cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - color = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep][[cell_color]], - colors = cell_color_code, - marker = list(size = point_size_select))%>% - plotly::add_trace(type = 'scatter3d',mode = "markers", - name = "unselected cells", - data=cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - marker = list(size = point_size_other,color = "lightgray",colors = "lightgray"), - opacity = point_alpha_other) - if(show_other_cells){ - pl <- pl %>% plotly::add_trace(type = 'scatter3d',mode = 'markers', - #name = "selected cells outside network", - data=cell_locations_metadata[cell_ID %in% other_cell_IDs], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - color = cell_locations_metadata[cell_ID %in% other_cell_IDs][[cell_color]], - colors = cell_color_code, - opacity = point_alpha_other, - marker = list(size = point_size_select*0.7)) - } - } - else{ - cat('cell_color not exist!\n') - } - } - else { - pl <- pl %>% plotly::add_trace(type = 'scatter3d',mode = 'markers', - name = "selected cells", - data=cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - marker = list(size = point_size_select,color = "lightblue",colors = "lightblue"))%>% - plotly::add_trace(type = 'scatter3d',mode = "markers", - name = "unselected cells", - data=cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - marker = list(size = point_size_other,color = "lightgray",colors = "lightgray"), - opacity = point_alpha_other) - if(show_other_cells){ - pl <- pl %>% plotly::add_trace(type = 'scatter3d',mode = 'markers', - data=cell_locations_metadata[cell_ID %in% other_cell_IDs], - x = ~sdimx, y = ~sdimy, z = ~sdimz, - name = "selected cells outside network", - marker = list(size = point_size_select*0.7,color = "lightblue",colors = "lightblue"), - opacity = point_alpha_other) - } - } - if(!is.null(spatial_network) & show_network == TRUE) { - if(is.null(network_color)) { - network_color = 'red' - } - unselect_network <- spatial_network[!unified_int %in% interaction_name] - select_network <- spatial_network[unified_int %in% interaction_name] - pl <- pl %>% plotly::add_trace(name = "sptial network",mode = "lines", type = "scatter3d",opacity=0.5, - data = plotly_network(select_network), - x = ~x,y=~y,z=~z,inherit = F,line=list(color=network_color)) - if(show_other_network == T){ - pl <- pl %>% plotly::add_trace(name = "unselected sptial network",mode = "lines", type = "scatter3d",opacity=0.1, - data = plotly_network(unselect_network), - x = ~x,y=~y,z=~z,inherit = F,line=list(color="lightgray")) - } - - - } - - pl <- pl %>% plotly::layout(scene = list( - xaxis = list(title = "X",nticks = x_ticks), - yaxis = list(title = "Y",nticks = y_ticks), - zaxis = list(title = "Z",nticks = z_ticks), - aspectmode='manual', - aspectratio = list(x=ratio[[1]], - y=ratio[[2]], - z=ratio[[3]]))) - return(pl) + interaction_name = NULL, + cluster_column = NULL, + sdimx = NULL, + sdimy = NULL, + sdimz = NULL, + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + show_legend = TRUE, + point_size_select = 2, + point_size_other = 1, + point_alpha_other = 0.5, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + x_ticks = NULL, + y_ticks = NULL, + z_ticks = NULL, + ...) { + # data.table variables + cell_ID <- unified_int <- NULL + + if (is.null(interaction_name)) { + stop("you need to specific at least one interaction name, + run cellProximityEnrichment") + } + + + cell_locations <- gobject@spatial_locs + spatial_grid <- gobject@spatial_grid[[spatial_grid_name]] + cell_metadata <- gobject@cell_metadata + + + spatial_network <- annotateSpatialNetwork( + gobject = gobject, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column) + + cell_IDs_to_keep <- unique(c(spatial_network[ + unified_int %in% interaction_name]$to, + spatial_network[unified_int %in% interaction_name]$from)) + + if (show_other_cells) { + CellType <- strsplit(interaction_name, "-") + all_cell_IDs <- cell_metadata[cell_metadata[[ + cluster_column]] == CellType[[1]][1] | + cell_metadata[[cluster_column]] == CellType[[1]][2], ]$cell_ID + other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) + } + + # annotated cell data + if (nrow(cell_metadata) == 0) { + cell_locations_metadata <- cell_locations + } else { + cell_locations_metadata <- merge( + cell_locations, cell_metadata, by = "cell_ID") + } + + + + + # first 2 dimensions need to be defined + if (is.null(sdimx) | is.null(sdimy)) { + message("first and second dimenion need to be defined, default is + first 2") + sdimx <- "sdimx" + sdimy <- "sdimy" + } + + + + + + # if 3 dimensions are defined create a 3D plot + + # create 3D plotly plot + + pl <- plotly::plot_ly() + axis_scale <- match.arg(axis_scale, c("cube", "real", "custom")) + ratio <- plotly_axis_scale_3D(cell_locations_metadata, + sdimx = sdimx, sdimy = sdimy, sdimz = sdimz, + mode = axis_scale, custom_ratio = custom_ratio + ) + + if (!is.null(cell_color)) { + if (cell_color %in% colnames(cell_locations_metadata)) { + if (is.null(cell_color_code)) { + number_colors <- length(unique(cell_locations_metadata[[ + cell_color]])) + cell_color_code <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = number_colors) + } + cell_locations_metadata[[cell_color]] <- as.factor( + cell_locations_metadata[[cell_color]]) + + pl <- pl %>% + plotly::add_trace( + type = "scatter3d", mode = "markers", + data = cell_locations_metadata[ + cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + color = cell_locations_metadata[ + cell_ID %in% cell_IDs_to_keep][[cell_color]], + colors = cell_color_code, + marker = list(size = point_size_select) + ) %>% + plotly::add_trace( + type = "scatter3d", mode = "markers", + name = "unselected cells", + data = cell_locations_metadata[ + !cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + marker = list( + size = point_size_other, + color = "lightgray", + colors = "lightgray"), + opacity = point_alpha_other + ) + if (show_other_cells) { + pl <- pl %>% plotly::add_trace( + type = "scatter3d", mode = "markers", + # name = "selected cells outside network", + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + color = cell_locations_metadata[ + cell_ID %in% other_cell_IDs][[cell_color]], + colors = cell_color_code, + opacity = point_alpha_other, + marker = list(size = point_size_select * 0.7) + ) + } + } else { + message("cell_color not exist!") + } + } else { + pl <- pl %>% + plotly::add_trace( + type = "scatter3d", mode = "markers", + name = "selected cells", + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + marker = list( + size = point_size_select, + color = "lightblue", + colors = "lightblue") + ) %>% + plotly::add_trace( + type = "scatter3d", mode = "markers", + name = "unselected cells", + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + marker = list( + size = point_size_other, + color = "lightgray", + colors = "lightgray"), + opacity = point_alpha_other + ) + if (show_other_cells) { + pl <- pl %>% plotly::add_trace( + type = "scatter3d", mode = "markers", + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + x = ~sdimx, y = ~sdimy, z = ~sdimz, + name = "selected cells outside network", + marker = list( + size = point_size_select * 0.7, + color = "lightblue", + colors = "lightblue"), + opacity = point_alpha_other + ) + } + } + if (!is.null(spatial_network) & show_network == TRUE) { + if (is.null(network_color)) { + network_color <- "red" + } + unselect_network <- spatial_network[!unified_int %in% interaction_name] + select_network <- spatial_network[unified_int %in% interaction_name] + pl <- pl %>% plotly::add_trace( + name = "sptial network", mode = "lines", + type = "scatter3d", opacity = 0.5, + data = plotly_network(select_network), + x = ~x, y = ~y, z = ~z, inherit = FALSE, + line = list(color = network_color) + ) + if (show_other_network == TRUE) { + pl <- pl %>% plotly::add_trace( + name = "unselected sptial network", mode = "lines", + type = "scatter3d", opacity = 0.1, + data = plotly_network(unselect_network), + x = ~x, y = ~y, z = ~z, inherit = FALSE, + line = list(color = "lightgray") + ) + } + } + + pl <- pl %>% plotly::layout(scene = list( + xaxis = list(title = "X", nticks = x_ticks), + yaxis = list(title = "Y", nticks = y_ticks), + zaxis = list(title = "Z", nticks = z_ticks), + aspectmode = "manual", + aspectratio = list( + x = ratio[[1]], + y = ratio[[2]], + z = ratio[[3]] + ) + )) + return(pl) } #' @title cellProximityVisPlot #' @name cellProximityVisPlot -#' @description Visualize cell-cell interactions according to spatial coordinates +#' @description Visualize cell-cell interactions according to spatial +#' coordinates #' @param gobject giotto object #' @param interaction_name cell-cell interaction name #' @param cluster_column cluster column with cell clusters @@ -929,159 +1111,151 @@ NULL #' @details Description of parameters. #' @export cellProximityVisPlot <- function(gobject, - interaction_name = NULL, - cluster_column = NULL, - sdimx = NULL, - sdimy = NULL, - sdimz = NULL, - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = F, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - coord_fix_ratio = 1, - show_legend = T, - point_size_select = 2, - point_select_border_col = 'black', - point_select_border_stroke = 0.05, - point_size_other = 1, - point_alpha_other = 0.3, - point_other_border_col = 'lightgrey', - point_other_border_stroke = 0.01, - axis_scale = c("cube","real","custom"), - custom_ratio = NULL, - x_ticks = NULL, - y_ticks = NULL, - z_ticks = NULL, - plot_method = c('ggplot', 'plotly'), - ...) { - - - ## decide plot method - plot_method = match.arg(plot_method, choices = c('ggplot', 'plotly')) - axis_scale = match.arg(axis_scale, c("cube","real","custom")) - - - if(plot_method == 'ggplot') { - - if(is.null(sdimx) | is.null(sdimy)) { - - warning("plot_method = ggplot, but spatial dimensions for sdimx and sdimy for 2D plotting are not given. \n - It will default to the 'sdimx' and 'sdimy' ") - sdimx = 'sdimx' - sdimy = 'sdimy' - #stop('\n ggplot is in 2D and you need to define sdimx and sdimy \n') - } - - if(length(c(sdimx, sdimy, sdimz)) == 3){ - warning("ggplot is not able to produce 3D plot! Please choose plotly method\n") - } - result = .cellProximityVisPlot_2D_ggplot(gobject = gobject, - interaction_name = interaction_name, - cluster_column = cluster_column, - sdimx = sdimx, - sdimy = sdimy, - cell_color = cell_color, - cell_color_code = cell_color_code, - color_as_factor = color_as_factor, - show_other_cells = show_other_cells, - show_network = show_network, - show_other_network = show_other_network, - network_color = network_color, - spatial_network_name = spatial_network_name, - show_grid = show_grid, - grid_color = grid_color, - spatial_grid_name = spatial_grid_name, - coord_fix_ratio = coord_fix_ratio, - show_legend = show_legend, - point_size_select = point_size_select, - point_select_border_col = point_select_border_col, - point_select_border_stroke = point_select_border_stroke, - point_size_other = point_size_other, - point_alpha_other =point_alpha_other, - point_other_border_col = point_other_border_col, - point_other_border_stroke = point_other_border_stroke, - ...) - - } - else if(plot_method == 'plotly') { - - if(length(c(sdimx, sdimy, sdimz)) == 3) { - - result = .cellProximityVisPlot_3D_plotly(gobject = gobject, - interaction_name = interaction_name, - cluster_column = cluster_column, - sdimx = sdimx, - sdimy = sdimy, - sdimz = sdimz, - cell_color = cell_color, - cell_color_code = cell_color_code, - color_as_factor = color_as_factor, - show_other_cells = show_other_cells, - show_network = show_network, - show_other_network = show_other_network, - network_color = network_color, - spatial_network_name = spatial_network_name, - show_grid = show_grid, - grid_color = grid_color, - spatial_grid_name = spatial_grid_name, - show_legend = show_legend, - point_size_select = point_size_select, - point_size_other = point_size_other, - point_alpha_other = point_alpha_other, - axis_scale = axis_scale, - custom_ratio = custom_ratio, - x_ticks = x_ticks, - y_ticks = y_ticks, - z_ticks = z_ticks, - ...) - - } - else { - - if(is.null(sdimx) | is.null(sdimy)) { - stop('\n plotly in 2D requires you to define sdimx and sdimy \n') - } - - ## run: visPlot_2D_plotly - result = .cellProximityVisPlot_2D_plotly(gobject = gobject, - interaction_name = interaction_name, - cluster_column = cluster_column, - sdimx = sdimx, - sdimy = sdimy, - cell_color = cell_color, - cell_color_code = cell_color_code, - color_as_factor = color_as_factor, - show_other_cells = show_other_cells, - show_network = show_network, - show_other_network = show_other_network, - network_color = network_color, - spatial_network_name = spatial_network_name, - show_grid = show_grid, - grid_color = grid_color, - spatial_grid_name = spatial_grid_name, - show_legend = show_legend, - point_size_select = point_size_select, - point_size_other = point_size_other, - point_alpha_other = point_alpha_other, - axis_scale = axis_scale, - custom_ratio = custom_ratio, - x_ticks = x_ticks, - y_ticks = y_ticks, - ...) - - - } - - } - return(result) - + interaction_name = NULL, + cluster_column = NULL, + sdimx = NULL, + sdimy = NULL, + sdimz = NULL, + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + coord_fix_ratio = 1, + show_legend = TRUE, + point_size_select = 2, + point_select_border_col = "black", + point_select_border_stroke = 0.05, + point_size_other = 1, + point_alpha_other = 0.3, + point_other_border_col = "lightgrey", + point_other_border_stroke = 0.01, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + x_ticks = NULL, + y_ticks = NULL, + z_ticks = NULL, + plot_method = c("ggplot", "plotly"), + ...) { + ## decide plot method + plot_method <- match.arg(plot_method, choices = c("ggplot", "plotly")) + axis_scale <- match.arg(axis_scale, c("cube", "real", "custom")) + + + if (plot_method == "ggplot") { + if (is.null(sdimx) | is.null(sdimy)) { + warning("plot_method = ggplot, but spatial dimensions for sdimx + and sdimy for 2D plotting are not given. \n + It will default to the 'sdimx' and 'sdimy'") + sdimx <- "sdimx" + sdimy <- "sdimy" + } + + if (length(c(sdimx, sdimy, sdimz)) == 3) { + warning("ggplot is not able to produce 3D plot! Please choose + plotly method") + } + result <- .cellProximityVisPlot_2D_ggplot( + gobject = gobject, + interaction_name = interaction_name, + cluster_column = cluster_column, + sdimx = sdimx, + sdimy = sdimy, + cell_color = cell_color, + cell_color_code = cell_color_code, + color_as_factor = color_as_factor, + show_other_cells = show_other_cells, + show_network = show_network, + show_other_network = show_other_network, + network_color = network_color, + spatial_network_name = spatial_network_name, + show_grid = show_grid, + grid_color = grid_color, + spatial_grid_name = spatial_grid_name, + coord_fix_ratio = coord_fix_ratio, + show_legend = show_legend, + point_size_select = point_size_select, + point_select_border_col = point_select_border_col, + point_select_border_stroke = point_select_border_stroke, + point_size_other = point_size_other, + point_alpha_other = point_alpha_other, + point_other_border_col = point_other_border_col, + point_other_border_stroke = point_other_border_stroke, + ... + ) + } else if (plot_method == "plotly") { + if (length(c(sdimx, sdimy, sdimz)) == 3) { + result <- .cellProximityVisPlot_3D_plotly( + gobject = gobject, + interaction_name = interaction_name, + cluster_column = cluster_column, + sdimx = sdimx, + sdimy = sdimy, + sdimz = sdimz, + cell_color = cell_color, + cell_color_code = cell_color_code, + color_as_factor = color_as_factor, + show_other_cells = show_other_cells, + show_network = show_network, + show_other_network = show_other_network, + network_color = network_color, + spatial_network_name = spatial_network_name, + show_grid = show_grid, + grid_color = grid_color, + spatial_grid_name = spatial_grid_name, + show_legend = show_legend, + point_size_select = point_size_select, + point_size_other = point_size_other, + point_alpha_other = point_alpha_other, + axis_scale = axis_scale, + custom_ratio = custom_ratio, + x_ticks = x_ticks, + y_ticks = y_ticks, + z_ticks = z_ticks, + ... + ) + } else { + if (is.null(sdimx) | is.null(sdimy)) { + stop("plotly in 2D requires you to define sdimx and sdimy") + } + + ## run: visPlot_2D_plotly + result <- .cellProximityVisPlot_2D_plotly( + gobject = gobject, + interaction_name = interaction_name, + cluster_column = cluster_column, + sdimx = sdimx, + sdimy = sdimy, + cell_color = cell_color, + cell_color_code = cell_color_code, + color_as_factor = color_as_factor, + show_other_cells = show_other_cells, + show_network = show_network, + show_other_network = show_other_network, + network_color = network_color, + spatial_network_name = spatial_network_name, + show_grid = show_grid, + grid_color = grid_color, + spatial_grid_name = spatial_grid_name, + show_legend = show_legend, + point_size_select = point_size_select, + point_size_other = point_size_other, + point_alpha_other = point_alpha_other, + axis_scale = axis_scale, + custom_ratio = custom_ratio, + x_ticks = x_ticks, + y_ticks = y_ticks, + ... + ) + } + } + return(result) } @@ -1107,7 +1281,8 @@ cellProximityVisPlot <- function(gobject, #' @param min_cells minimum number of source cell type #' @param min_cells_expr minimum expression level for source cell type #' @param min_int_cells minimum number of interacting neighbor cell type -#' @param min_int_cells_expr minimum expression level for interacting neighbor cell type +#' @param min_int_cells_expr minimum expression level for interacting neighbor +#' cell type #' @param min_fdr minimum adjusted p-value #' @param min_spat_diff minimum absolute spatial expression difference #' @param min_log2_fc minimum log2 fold-change @@ -1116,246 +1291,295 @@ cellProximityVisPlot <- function(gobject, #' @param direction differential expression directions to keep #' @return plot #' @export -plotCellProximityFeats = function(gobject, - icfObject, - method = c('volcano', 'cell_barplot', 'cell-cell', 'cell_sankey', 'heatmap', 'dotplot'), - min_cells = 4, - min_cells_expr = 1, - min_int_cells = 4, - min_int_cells_expr = 1, - min_fdr = 0.1, - min_spat_diff = 0.2, - min_log2_fc = 0.2, - min_zscore = 2, - zscores_column = c('cell_type', 'feats'), - direction = c('both', 'up', 'down'), - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCellProximityFeats') { - - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - # print, return and save parameters - show_plot = ifelse(is.na(show_plot), readGiottoInstructions(gobject, param = 'show_plot'), show_plot) - save_plot = ifelse(is.na(save_plot), readGiottoInstructions(gobject, param = 'save_plot'), save_plot) - return_plot = ifelse(is.na(return_plot), readGiottoInstructions(gobject, param = 'return_plot'), return_plot) - - - ## first filter - filter_icf = filterInteractionChangedFeats(icfObject = icfObject, - min_cells = min_cells, - min_cells_expr = min_cells_expr, - min_int_cells = min_int_cells, - min_int_cells_expr = min_int_cells_expr, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc, - min_zscore = min_zscore, - zscores_column = zscores_column, - direction = direction) - - complete_part = filter_icf[['ICFscores']] - - ## other parameters - method = match.arg(method, choices = c('volcano', 'cell_barplot', 'cell-cell', 'cell_sankey', 'heatmap', 'dotplot')) - - - # variables - log2fc = p.adj = unif_int = N = cell_type = int_cell_type = NULL - - ## create data.table for visualization - if(method == 'volcano') { - - ## volcanoplot - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_point(data = complete_part, ggplot2::aes(x = log2fc, y = ifelse(is.infinite(-log10(p.adj)), 1000, -log10(p.adj)))) - pl = pl + ggplot2::theme_classic() - pl = pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) - pl = pl + ggplot2::labs(x = 'log2 fold-change', y = '-log10(p.adjusted)') - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - - } else if(method == 'cell-cell') { - - nr_int_selection_scores = complete_part[, .N, by = unif_int] - order_interactions = nr_int_selection_scores[order(N)]$unif_int - - complete_part[, unif_int := factor(unif_int, order_interactions)] - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::geom_bar(data = complete_part, ggplot2::aes(x = unif_int, fill = unif_int)) - pl <- pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 1)) - pl <- pl + ggplot2::coord_flip() - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - - } else if(method == 'cell_barplot') { - - - # by source cell type plot - nr_source_selection_scores = complete_part[, .N, by = cell_type] - order_source = nr_source_selection_scores[order(N)]$cell_type - - complete_part[, cell_type := factor(cell_type, order_source)] - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::geom_bar(data = complete_part, ggplot2::aes(x = cell_type, fill = int_cell_type)) - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } - pl <- pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) - pl <- pl + ggplot2::labs(x = '', y = '# of features influenced by cell neighborhood') - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'cell_sankey') { - - testalluv = complete_part[, .N, by = c('int_cell_type', 'cell_type')] - - # package check for ggalluvial - # verify if optional package is installed - package_check(pkg_name = "ggalluvial", repository = "CRAN") - - pl <- ggplot2::ggplot(testalluv, - ggplot2::aes(y = N, axis1 = cell_type, axis2 = int_cell_type)) + - ggalluvial::geom_alluvium(aes(fill = cell_type), width = 1/12) + - ggalluvial::geom_stratum(width = 1/12, fill = "black", color = "grey") + - ggplot2::scale_x_discrete(limits = c("cell type", "neighbours"), expand = c(.05, .05)) + - ggplot2::geom_label(stat = "stratum", label.strata = TRUE, size = 3) + - ggplot2::theme_classic() + ggplot2::labs(x = '', y = '# of features influenced by cell neighborhood') - - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } - - - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } - - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'dotplot') { - - changed_feats = complete_part[, .N, by = c('cell_type', 'int_cell_type')] - - changed_feats[, cell_type := factor(cell_type, unique(cell_type))] - changed_feats[, int_cell_type := factor(int_cell_type, unique(int_cell_type))] - - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() - pl = pl + ggplot2::geom_point(data = changed_feats, ggplot2::aes(x = cell_type, y = int_cell_type, size = N)) - pl = pl + ggplot2::scale_size_continuous(guide=guide_legend(title = '# of ICFs')) - pl = pl + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 1, hjust = 1)) - pl = pl + ggplot2::labs(x = 'source cell type', y = 'neighbor cell type') - - ## print plot - if(show_plot == TRUE) { - print(pl) - } - - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) +plotCellProximityFeats <- function(gobject, + icfObject, + method = c("volcano", "cell_barplot", "cell-cell", "cell_sankey", + "heatmap", "dotplot"), + min_cells = 4, + min_cells_expr = 1, + min_int_cells = 4, + min_int_cells_expr = 1, + min_fdr = 0.1, + min_spat_diff = 0.2, + min_log2_fc = 0.2, + min_zscore = 2, + zscores_column = c("cell_type", "feats"), + direction = c("both", "up", "down"), + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCellProximityFeats") { + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") } - ## return plot - if(return_plot == TRUE) { - return(pl) - } - - } else if(method == 'heatmap') { - - changed_feats = complete_part[, .N, by = c('cell_type', 'int_cell_type')] - - changed_feats[, cell_type := factor(cell_type, unique(cell_type))] - changed_feats[, int_cell_type := factor(int_cell_type, unique(int_cell_type))] - - changed_feats_d = data.table::dcast.data.table(changed_feats, cell_type~int_cell_type, value.var = 'N', fill = 0) - changed_feats_m = dt_to_matrix(changed_feats_d) - - col_fun = GiottoVisuals::colorRamp2(breaks = stats::quantile(log2(changed_feats_m+1)), - colors = c("white", 'white', "blue", "yellow", "red")) - - heatm = ComplexHeatmap::Heatmap(log2(changed_feats_m+1), col = col_fun, - row_title = 'cell_type', column_title = 'int_cell_type', heatmap_legend_param = list(title = 'log2(# DEGs)')) + # print, return and save parameters + show_plot <- ifelse( + is.na(show_plot), + readGiottoInstructions(gobject, param = "show_plot"), show_plot) + save_plot <- ifelse( + is.na(save_plot), + readGiottoInstructions(gobject, param = "save_plot"), save_plot) + return_plot <- ifelse( + is.na(return_plot), + readGiottoInstructions(gobject, param = "return_plot"), return_plot) + + + ## first filter + filter_icf <- filterInteractionChangedFeats( + icfObject = icfObject, + min_cells = min_cells, + min_cells_expr = min_cells_expr, + min_int_cells = min_int_cells, + min_int_cells_expr = min_int_cells_expr, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc, + min_zscore = min_zscore, + zscores_column = zscores_column, + direction = direction + ) - ## print plot - if(show_plot == TRUE) { - print(heatm) - } + complete_part <- filter_icf[["ICFscores"]] + + ## other parameters + method <- match.arg( + method, + choices = c("volcano", "cell_barplot", "cell-cell", "cell_sankey", + "heatmap", "dotplot")) + + + # variables + log2fc <- p.adj <- unif_int <- N <- cell_type <- int_cell_type <- NULL + + ## create data.table for visualization + if (method == "volcano") { + ## volcanoplot + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_point( + data = complete_part, + ggplot2::aes(x = log2fc, + y = ifelse(is.infinite(-log10(p.adj)), + 1000, -log10(p.adj)))) + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_vline(xintercept = 0, linetype = 2) + pl <- pl + ggplot2::labs( + x = "log2 fold-change", y = "-log10(p.adjusted)") + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell-cell") { + nr_int_selection_scores <- complete_part[, .N, by = unif_int] + order_interactions <- nr_int_selection_scores[order(N)]$unif_int + + complete_part[, unif_int := factor(unif_int, order_interactions)] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_bar( + data = complete_part, + ggplot2::aes(x = unif_int, fill = unif_int)) + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = ggplot2::element_text( + angle = 90, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::coord_flip() + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell_barplot") { + # by source cell type plot + nr_source_selection_scores <- complete_part[, .N, by = cell_type] + order_source <- nr_source_selection_scores[order(N)]$cell_type + + complete_part[, cell_type := factor(cell_type, order_source)] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_bar( + data = complete_part, + ggplot2::aes(x = cell_type, fill = int_cell_type)) + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + pl <- pl + ggplot2::labs( + x = "", y = "# of features influenced by cell neighborhood") + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "cell_sankey") { + testalluv <- complete_part[, .N, by = c("int_cell_type", "cell_type")] + + # package check for ggalluvial + # verify if optional package is installed + package_check(pkg_name = "ggalluvial", repository = "CRAN") + + pl <- ggplot2::ggplot( + testalluv, + ggplot2::aes(y = N, axis1 = cell_type, axis2 = int_cell_type) + ) + + ggalluvial::geom_alluvium(aes(fill = cell_type), width = 1 / 12) + + ggalluvial::geom_stratum( + width = 1 / 12, fill = "black", color = "grey") + + ggplot2::scale_x_discrete( + limits = c("cell type", "neighbours"), expand = c(.05, .05)) + + ggplot2::geom_label( + stat = "stratum", label.strata = TRUE, size = 3) + + ggplot2::theme_classic() + + ggplot2::labs( + x = "", y = "# of features influenced by cell neighborhood") + + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } + + + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "dotplot") { + changed_feats <- complete_part[ + , .N, by = c("cell_type", "int_cell_type")] + + changed_feats[, cell_type := factor(cell_type, unique(cell_type))] + changed_feats[, int_cell_type := factor( + int_cell_type, unique(int_cell_type))] + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_point( + data = changed_feats, + ggplot2::aes(x = cell_type, y = int_cell_type, size = N)) + pl <- pl + ggplot2::scale_size_continuous( + guide = guide_legend(title = "# of ICFs")) + pl <- pl + ggplot2::theme(axis.text.x = ggplot2::element_text( + angle = 90, vjust = 1, hjust = 1)) + pl <- pl + ggplot2::labs( + x = "source cell type", y = "neighbor cell type") + + ## print plot + if (show_plot == TRUE) { + print(pl) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(pl) + } + } else if (method == "heatmap") { + changed_feats <- complete_part[ + , .N, by = c("cell_type", "int_cell_type")] + + changed_feats[, cell_type := factor(cell_type, unique(cell_type))] + changed_feats[, int_cell_type := factor( + int_cell_type, unique(int_cell_type))] + + changed_feats_d <- data.table::dcast.data.table( + changed_feats, cell_type ~ int_cell_type, value.var = "N", fill = 0) + changed_feats_m <- dt_to_matrix(changed_feats_d) + + col_fun <- GiottoVisuals::colorRamp2( + breaks = stats::quantile(log2(changed_feats_m + 1)), + colors = c("white", "white", "blue", "yellow", "red") + ) - ## save plot - if(save_plot == TRUE) { - do.call('all_plots_save_function', c(list(gobject = gobject, plot_object = heatm, default_save_name = default_save_name), save_param)) - } + heatm <- ComplexHeatmap::Heatmap(log2(changed_feats_m + 1), + col = col_fun, + row_title = "cell_type", + column_title = "int_cell_type", + heatmap_legend_param = list(title = "log2(# DEGs)") + ) - ## return plot - if(return_plot == TRUE) { - return(heatm) + ## print plot + if (show_plot == TRUE) { + print(heatm) + } + + ## save plot + if (save_plot == TRUE) { + do.call( + "all_plots_save_function", + c(list(gobject = gobject, plot_object = heatm, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (return_plot == TRUE) { + return(heatm) + } } - - } - } @@ -1381,7 +1605,8 @@ plotCellProximityFeats = function(gobject, #' @param min_cells minimum number of source cell type #' @param min_cells_expr minimum expression level for source cell type #' @param min_int_cells minimum number of interacting neighbor cell type -#' @param min_int_cells_expr minimum expression level for interacting neighbor cell type +#' @param min_int_cells_expr minimum expression level for interacting neighbor +#' cell type #' @param min_fdr minimum adjusted p-value #' @param min_spat_diff minimum absolute spatial expression difference #' @param min_log2_fc minimum log2 fold-change @@ -1390,48 +1615,47 @@ plotCellProximityFeats = function(gobject, #' @param direction differential expression directions to keep #' @return plot #' @export -plotCPF = function(gobject, - icfObject, - method = c('volcano', 'cell_barplot', 'cell-cell', 'cell_sankey', 'heatmap', 'dotplot'), - min_cells = 5, - min_cells_expr = 1, - min_int_cells = 3, - min_int_cells_expr = 1, - min_fdr = 0.05, - min_spat_diff = 0.2, - min_log2_fc = 0.2, - min_zscore = 2, - zscores_column = c('cell_type', 'feats'), - direction = c('both', 'up', 'down'), - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCPG') { - - - plotCellProximityFeats(gobject = gobject, - icfObject = icfObject, - method = method, - min_cells = min_cells, - min_cells_expr = min_cells_expr, - min_int_cells = min_int_cells, - min_int_cells_expr = min_int_cells_expr, - min_fdr = min_fdr, - min_spat_diff = min_spat_diff, - min_log2_fc = min_log2_fc, - min_zscore = min_zscore, - zscores_column = zscores_column, - direction = direction, - cell_color_code = cell_color_code, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot, - save_param = save_param, - default_save_name = default_save_name) - - +plotCPF <- function(gobject, + icfObject, + method = c("volcano", "cell_barplot", "cell-cell", "cell_sankey", + "heatmap", "dotplot"), + min_cells = 5, + min_cells_expr = 1, + min_int_cells = 3, + min_int_cells_expr = 1, + min_fdr = 0.05, + min_spat_diff = 0.2, + min_log2_fc = 0.2, + min_zscore = 2, + zscores_column = c("cell_type", "feats"), + direction = c("both", "up", "down"), + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCPG") { + plotCellProximityFeats( + gobject = gobject, + icfObject = icfObject, + method = method, + min_cells = min_cells, + min_cells_expr = min_cells_expr, + min_int_cells = min_int_cells, + min_int_cells_expr = min_int_cells_expr, + min_fdr = min_fdr, + min_spat_diff = min_spat_diff, + min_log2_fc = min_log2_fc, + min_zscore = min_zscore, + zscores_column = zscores_column, + direction = direction, + cell_color_code = cell_color_code, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot, + save_param = save_param, + default_save_name = default_save_name + ) } @@ -1448,77 +1672,87 @@ plotCPF = function(gobject, #' @param ICF_feats named character vector of ICF features #' @return plot #' @export -plotInteractionChangedFeats = function(gobject, - icfObject, - source_type, - source_markers, - ICF_feats, - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotInteractionChangedFeats') { - - - # data.table variables - cell_type = int_cell_type = log2fc = NULL - - - if(!'icfObject' %in% class(icfObject)) { - stop('\n icfObject needs to be the output from findInteractionChangedFeats() or findICF() \n') - } - - ICFscores = icfObject[['ICFscores']] - - # combine feats - names(source_markers) = rep('marker', length(source_markers)) - neighbor_types = names(ICF_feats) - all_feats = c(source_markers, ICF_feats) - - # warning if there are feats selected that are not detected - detected_feats = unique(ICFscores[['feats']]) - not_detected_feats = all_feats[!all_feats %in% detected_feats] - if(length(not_detected_feats) > 0) { - cat('These selected features are not in the icfObject: \n', - not_detected_feats, '\n') - } - - # data.table set column names - feats = group = NULL - - tempDT = ICFscores[feats %in% all_feats][cell_type == source_type][int_cell_type %in% neighbor_types] - tempDT[, feats := factor(feats, levels = all_feats)] - tempDT[, group := names(all_feats[all_feats == feats]), by = 1:nrow(tempDT)] - - - if(is.null(cell_color_code)) { - mycolors = set_default_color_discrete_cell(instrs = instructions(gobject))(n = length(unique(tempDT$int_cell_type))) - names(mycolors) = unique(tempDT$int_cell_type) - } else { - mycolors = cell_color_code - } - - - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = ggplot2::element_text(size = 14, angle = 45, vjust = 1, hjust = 1), - axis.text.y = ggplot2::element_text(size = 14), - axis.title = ggplot2::element_text(size = 14)) - pl = pl + ggplot2::geom_bar(data = tempDT, ggplot2::aes(x = feats, y = log2fc, fill = int_cell_type), stat = 'identity', position = ggplot2::position_dodge()) - pl = pl + ggplot2::scale_fill_manual(values = mycolors) - pl = pl + ggplot2::labs(x = '', title = paste0('fold-change z-scores in ' ,source_type)) - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +plotInteractionChangedFeats <- function(gobject, + icfObject, + source_type, + source_markers, + ICF_feats, + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotInteractionChangedFeats") { + # data.table variables + cell_type <- int_cell_type <- log2fc <- NULL + + + if (!"icfObject" %in% class(icfObject)) { + stop("icfObject needs to be the output from + findInteractionChangedFeats() or findICF()") + } + + ICFscores <- icfObject[["ICFscores"]] + + # combine feats + names(source_markers) <- rep("marker", length(source_markers)) + neighbor_types <- names(ICF_feats) + all_feats <- c(source_markers, ICF_feats) + + # warning if there are feats selected that are not detected + detected_feats <- unique(ICFscores[["feats"]]) + not_detected_feats <- all_feats[!all_feats %in% detected_feats] + if (length(not_detected_feats) > 0) { + cat( + "These selected features are not in the icfObject: \n", + not_detected_feats) + } + + # data.table set column names + feats <- group <- NULL + + tempDT <- ICFscores[feats %in% all_feats][cell_type == source_type][ + int_cell_type %in% neighbor_types] + tempDT[, feats := factor(feats, levels = all_feats)] + tempDT[, group := names(all_feats[all_feats == feats]), by = 1:nrow(tempDT)] + + + if (is.null(cell_color_code)) { + mycolors <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = length(unique( + tempDT$int_cell_type))) + names(mycolors) <- unique(tempDT$int_cell_type) + } else { + mycolors <- cell_color_code + } + + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = ggplot2::element_text( + size = 14, angle = 45, vjust = 1, hjust = 1), + axis.text.y = ggplot2::element_text(size = 14), + axis.title = ggplot2::element_text(size = 14) + ) + pl <- pl + ggplot2::geom_bar( + data = tempDT, + ggplot2::aes(x = feats, y = log2fc, fill = int_cell_type), + stat = "identity", position = ggplot2::position_dodge()) + pl <- pl + ggplot2::scale_fill_manual(values = mycolors) + pl <- pl + ggplot2::labs(x = "", title = paste0( + "fold-change z-scores in ", source_type)) + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -1539,31 +1773,30 @@ plotInteractionChangedFeats = function(gobject, #' @param ICF_feats named character vector of ICF features #' @return plot #' @export -plotICF = function(gobject, - icfObject, - source_type, - source_markers, - ICF_feats, - cell_color_code = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotICF') { - - - plotInteractionChangedFeats(gobject = gobject, - icfObject = icfObject, - source_type = source_type, - source_markers = source_markers, - ICF_feats = ICF_feats, - cell_color_code = cell_color_code, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot, - save_param = save_param, - default_save_name = default_save_name) - +plotICF <- function(gobject, + icfObject, + source_type, + source_markers, + ICF_feats, + cell_color_code = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotICF") { + plotInteractionChangedFeats( + gobject = gobject, + icfObject = icfObject, + source_type = source_type, + source_markers = source_markers, + ICF_feats = ICF_feats, + cell_color_code = cell_color_code, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot, + save_param = save_param, + default_save_name = default_save_name + ) } @@ -1591,106 +1824,149 @@ plotICF = function(gobject, #' @return ggplot #' @export plotCombineInteractionChangedFeats <- function(gobject, - combIcfObject, - selected_interactions = NULL, - selected_feat_to_feat = NULL, - detail_plot = T, - simple_plot = F, - simple_plot_facet = c('interaction', 'feats'), - facet_scales = 'fixed', - facet_ncol = length(selected_feat_to_feat), - facet_nrow = length(selected_interactions), - colors = c('#9932CC', '#FF8C00'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCombineICF') { - - - - ## check validity - if(!'combIcfObject' %in% class(combIcfObject)) { - stop('\n combIcfObject needs to be the output from combineInteractionChangedFeats() or combineICF() \n') - } - combIcfscore = copy(combIcfObject[['combIcfscores']]) - - if(is.null(selected_interactions) | is.null(selected_feat_to_feat)) { - stop('\n You need to provide a selection of cell-cell interactions and features-features to plot \n') - } - - - # data.table variables - unif_feat_feat = unif_int = other_2 = sel_2 = other_1 = sel_1 = cols = NULL - - - subDT = combIcfscore[unif_feat_feat %in% selected_feat_to_feat & unif_int %in% selected_interactions] - - # order interactions and feat-to-feat according to input - subDT[, unif_feat_feat := factor(unif_feat_feat, levels = selected_feat_to_feat)] - subDT[, unif_int := factor(unif_int, levels = selected_interactions)] + combIcfObject, + selected_interactions = NULL, + selected_feat_to_feat = NULL, + detail_plot = TRUE, + simple_plot = FALSE, + simple_plot_facet = c("interaction", "feats"), + facet_scales = "fixed", + facet_ncol = length(selected_feat_to_feat), + facet_nrow = length(selected_interactions), + colors = c("#9932CC", "#FF8C00"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCombineICF") { + ## check validity + if (!"combIcfObject" %in% class(combIcfObject)) { + stop("combIcfObject needs to be the output from + combineInteractionChangedFeats() or combineICF()") + } + combIcfscore <- copy(combIcfObject[["combIcfscores"]]) - if(simple_plot == F) { + if (is.null(selected_interactions) | is.null(selected_feat_to_feat)) { + stop("You need to provide a selection of cell-cell interactions and + features-features to plot") + } - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - if(detail_plot == TRUE) { - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = 0, y = other_2, colour = "other cell expression"),shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = 0, y = sel_2, colour = "selected cell expression"), shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = other_1, y = 0, colour = "other cell expression"), shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sel_1, y = 0,colour ="selected cell expression"), shape = 1) - } - - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = other_1, y = other_2, colour = "other cell expression"),size = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sel_1, y = sel_2, colour ="selected cell expression"), size = 2) - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = other_1, xend = sel_1, - y = other_2, yend = sel_2), linetype = 2) - #pl <- pl + ggplot2::labs(x = 'feat 1 in celltype 1', y = 'feat 2 in celltype 2') - pl <- pl + ggplot2::labs(x = paste(subDT$feats_1, subDT$cell_type_1, sep = " in ") - , y = paste(subDT$feats_2, subDT$cell_type_2, sep = " in ")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values = colors) - pl <- pl + ggplot2::facet_wrap(~unif_feat_feat+unif_int, nrow = facet_nrow, ncol = facet_ncol, - scales = facet_scales) - - }else { - - simple_plot_facet = match.arg(arg = simple_plot_facet, choices = c('interaction', 'feats')) - - if(simple_plot_facet == 'interaction') { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = sum(c(other_1, other_2)), xend = sum(c(sel_1, sel_2)), - y = unif_feat_feat, yend = unif_feat_feat), linetype = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sum(c(other_1, other_2)), y = unif_feat_feat,colour = "other cell expression")) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sum(c(sel_1, sel_2)), y = unif_feat_feat,colour ="selected cell expression")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values=cols) - pl <- pl + ggplot2::facet_wrap(~unif_int, scales = facet_scales) - pl <- pl + ggplot2::labs(x = 'interactions', y = 'feat-feat') + # data.table variables + unif_feat_feat <- unif_int <- other_2 <- sel_2 <- other_1 <- sel_1 <- + cols <- NULL + + + subDT <- combIcfscore[unif_feat_feat %in% selected_feat_to_feat & + unif_int %in% selected_interactions] + + # order interactions and feat-to-feat according to input + subDT[, unif_feat_feat := factor( + unif_feat_feat, levels = selected_feat_to_feat)] + subDT[, unif_int := factor(unif_int, levels = selected_interactions)] + + if (simple_plot == FALSE) { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + + if (detail_plot == TRUE) { + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = 0, y = other_2, colour = "other cell expression"), + shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = 0, y = sel_2, colour = "selected cell expression"), + shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = other_1, y = 0, colour = "other cell expression"), + shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sel_1, y = 0, colour = "selected cell expression"), + shape = 1) + } + + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = other_1, y = other_2, colour = "other cell expression"), + size = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sel_1, y = sel_2, colour = "selected cell expression"), + size = 2) + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = other_1, xend = sel_1, + y = other_2, yend = sel_2 + ), linetype = 2) + pl <- pl + ggplot2::labs( + x = paste(subDT$feats_1, subDT$cell_type_1, sep = " in "), + y = paste(subDT$feats_2, subDT$cell_type_2, sep = " in ") + ) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = colors) + pl <- pl + ggplot2::facet_wrap(~ unif_feat_feat + unif_int, + nrow = facet_nrow, ncol = facet_ncol, + scales = facet_scales + ) } else { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = sum(c(other_1, other_2)), xend = sum(c(sel_1, sel_2)), - y = unif_int, yend = unif_int), linetype = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sum(c(other_1, other_2)), y = unif_int, colour = "other cell expression")) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = sum(c(sel_1, sel_2)), y = unif_int, colour ="selected cell expression")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values=cols) - pl <- pl + ggplot2::facet_wrap(~unif_feat_feat, scales = facet_scales) - pl <- pl + ggplot2::labs(x = 'feat-feat', y = 'interactions') - } - } - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + simple_plot_facet <- match.arg( + arg = simple_plot_facet, choices = c("interaction", "feats")) + + if (simple_plot_facet == "interaction") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = sum(c(other_1, other_2)), xend = sum(c(sel_1, sel_2)), + y = unif_feat_feat, yend = unif_feat_feat + ), linetype = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sum(c(other_1, other_2)), y = unif_feat_feat, + colour = "other cell expression")) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sum(c(sel_1, sel_2)), y = unif_feat_feat, + colour = "selected cell expression")) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = cols) + pl <- pl + ggplot2::facet_wrap(~unif_int, scales = facet_scales) + pl <- pl + ggplot2::labs(x = "interactions", y = "feat-feat") + } else { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = sum(c(other_1, other_2)), xend = sum(c(sel_1, sel_2)), + y = unif_int, yend = unif_int + ), linetype = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sum(c(other_1, other_2)), y = unif_int, + colour = "other cell expression")) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = sum(c(sel_1, sel_2)), y = unif_int, + colour = "selected cell expression")) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = cols) + pl <- pl + ggplot2::facet_wrap( + ~unif_feat_feat, scales = facet_scales) + pl <- pl + ggplot2::labs(x = "feat-feat", y = "interactions") + } + } + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -1717,37 +1993,39 @@ plotCombineInteractionChangedFeats <- function(gobject, #' @return ggplot #' @export plotCombineICF <- function(gobject, - combIcfObject, - selected_interactions = NULL, - selected_feat_to_feat = NULL, - detail_plot = T, - simple_plot = F, - simple_plot_facet = c('interaction', 'feats'), - facet_scales = 'fixed', - facet_ncol = length(selected_feat_to_feat), - facet_nrow = length(selected_interactions), - colors = c('#9932CC', '#FF8C00'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCombineICF') { - - plotCombineInteractionChangedFeats(combIcfObject = combIcfObject, - selected_interactions = selected_interactions, - selected_feat_to_feat = selected_feat_to_feat, - detail_plot = detail_plot, - simple_plot = simple_plot, - simple_plot_facet = simple_plot_facet, - facet_scales = facet_scales, - facet_ncol = facet_ncol, - facet_nrow = facet_nrow, - colors = colors, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot, - save_param = save_param, - default_save_name = default_save_name) + combIcfObject, + selected_interactions = NULL, + selected_feat_to_feat = NULL, + detail_plot = TRUE, + simple_plot = FALSE, + simple_plot_facet = c("interaction", "feats"), + facet_scales = "fixed", + facet_ncol = length(selected_feat_to_feat), + facet_nrow = length(selected_interactions), + colors = c("#9932CC", "#FF8C00"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCombineICF") { + plotCombineInteractionChangedFeats( + gobject = gobject, + combIcfObject = combIcfObject, + selected_interactions = selected_interactions, + selected_feat_to_feat = selected_feat_to_feat, + detail_plot = detail_plot, + simple_plot = simple_plot, + simple_plot_facet = simple_plot_facet, + facet_scales = facet_scales, + facet_ncol = facet_ncol, + facet_nrow = facet_nrow, + colors = colors, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot, + save_param = save_param, + default_save_name = default_save_name + ) } @@ -1766,12 +2044,14 @@ plotCombineICF <- function(gobject, #' @title plotCombineCellCellCommunication #' @name plotCombineCellCellCommunication -#' @description Create visualization for combined (pairwise) cell proximity gene scores +#' @description Create visualization for combined (pairwise) cell proximity +#' gene scores #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @param combCCcom combined communcation scores, output from combCCcom() #' @param selected_LR selected ligand-receptor pair -#' @param selected_cell_LR selected cell-cell interaction pair for ligand-receptor pair +#' @param selected_cell_LR selected cell-cell interaction pair for +#' ligand-receptor pair #' @param detail_plot show detailed info in both interacting cell types #' @param simple_plot show a simplified plot #' @param simple_plot_facet facet on interactions or genes with simple plot @@ -1782,113 +2062,154 @@ plotCombineICF <- function(gobject, #' @return ggplot #' @export plotCombineCellCellCommunication <- function(gobject, - combCCcom, - selected_LR = NULL, - selected_cell_LR = NULL, - detail_plot = T, - simple_plot = F, - simple_plot_facet = c('interaction', 'genes'), - facet_scales = 'fixed', - facet_ncol = length(selected_LR), - facet_nrow = length(selected_cell_LR), - colors = c('#9932CC', '#FF8C00'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCombineCellCellCommunication') { - - - - # data.table variables - LR_comb = LR_cell_comb = lig_expr = lig_expr_spat = rec_expr = rec_expr_spat = LR_expr = LR_expr_spat = NULL - - ## check validity - if(is.null(selected_cell_LR) | is.null(selected_LR)) { - stop('\n You need to provide a selection of cell-cell interactions and genes-genes to plot \n') - } - - subDT = combCCcom[LR_comb %in% selected_LR & LR_cell_comb %in% selected_cell_LR] - - # order interactions and gene-to-gene according to input - subDT[, LR_comb := factor(LR_comb, levels = selected_LR)] - subDT[, LR_cell_comb := factor(LR_cell_comb, levels = selected_cell_LR)] - - if(simple_plot == F) { - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - - if(detail_plot == TRUE) { - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = 0, y = lig_expr, colour = "overall cell expression"),shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = 0, y = lig_expr_spat, colour = "spatial cell expression"), shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = rec_expr, y = 0, colour = "overall cell expression"), shape = 1) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = rec_expr_spat, y = 0,colour ="spatial cell expression"), shape = 1) - } - - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = rec_expr, y = lig_expr, colour = "overall cell expression"),size = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = rec_expr_spat, y = lig_expr_spat, colour ="spatial cell expression"), size = 2) - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = rec_expr, xend = rec_expr_spat, - y = lig_expr, yend = lig_expr_spat), linetype = 2) - #pl <- pl + ggplot2::labs(x = 'gene 1 in celltype 1', y = 'gene 2 in celltype 2') - pl <- pl + ggplot2::labs(x = paste(subDT$receptor, subDT$rec_cell_type, sep = " in ") - , y = paste(subDT$ligand, subDT$lig_cell_type, sep = " in ")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values = colors) - pl <- pl + ggplot2::facet_wrap(~LR_comb+LR_cell_comb, nrow = facet_nrow, ncol = facet_ncol, - scales = facet_scales) - - }else { - - simple_plot_facet = match.arg(arg = simple_plot_facet, choices = c('interaction', 'genes')) - - if(simple_plot_facet == 'interaction') { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = LR_expr, xend = LR_expr_spat, - y = LR_comb, yend = LR_comb), linetype = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = LR_expr, y = LR_comb, colour = "overall cell expression")) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = LR_expr_spat, y = LR_comb,colour ="spatial cell expression")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values=colors) - pl <- pl + ggplot2::facet_wrap(~LR_cell_comb, scales = 'fixed') - pl <- pl + ggplot2::labs(x = 'interactions', y = 'gene-gene') - pl + combCCcom, + selected_LR = NULL, + selected_cell_LR = NULL, + detail_plot = TRUE, + simple_plot = FALSE, + simple_plot_facet = c("interaction", "genes"), + facet_scales = "fixed", + facet_ncol = length(selected_LR), + facet_nrow = length(selected_cell_LR), + colors = c("#9932CC", "#FF8C00"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCombineCellCellCommunication") { + # data.table variables + LR_comb <- LR_cell_comb <- lig_expr <- lig_expr_spat <- rec_expr <- + rec_expr_spat <- LR_expr <- LR_expr_spat <- NULL + + ## check validity + if (is.null(selected_cell_LR) | is.null(selected_LR)) { + stop("You need to provide a selection of cell-cell interactions + and genes-genes to plot") + } + subDT <- combCCcom[ + LR_comb %in% selected_LR & LR_cell_comb %in% selected_cell_LR] + + # order interactions and gene-to-gene according to input + subDT[, LR_comb := factor(LR_comb, levels = selected_LR)] + subDT[, LR_cell_comb := factor(LR_cell_comb, levels = selected_cell_LR)] + + if (simple_plot == FALSE) { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + + if (detail_plot == TRUE) { + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = 0, y = lig_expr, colour = "overall cell expression"), + shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = 0, y = lig_expr_spat, + colour = "spatial cell expression"), shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = rec_expr, y = 0, colour = "overall cell expression"), + shape = 1) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = rec_expr_spat, y = 0, + colour = "spatial cell expression"), shape = 1) + } + + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = rec_expr, y = lig_expr, colour = "overall cell expression"), + size = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = rec_expr_spat, y = lig_expr_spat, + colour = "spatial cell expression"), size = 2) + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = rec_expr, xend = rec_expr_spat, + y = lig_expr, yend = lig_expr_spat + ), linetype = 2) + pl <- pl + ggplot2::labs( + x = paste(subDT$receptor, subDT$rec_cell_type, sep = " in "), + y = paste(subDT$ligand, subDT$lig_cell_type, sep = " in ") + ) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = colors) + pl <- pl + ggplot2::facet_wrap(~ LR_comb + LR_cell_comb, + nrow = facet_nrow, ncol = facet_ncol, + scales = facet_scales + ) } else { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_bw() - pl <- pl + ggplot2::geom_segment(data = subDT, aes(x = LR_expr, xend = LR_expr_spat, - y = LR_cell_comb, yend = LR_cell_comb), linetype = 2) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = LR_expr, y = LR_cell_comb, colour = "overall cell expression")) - pl <- pl + ggplot2::geom_point(data = subDT, aes(x = LR_expr_spat, y = LR_cell_comb, colour ="spatial cell expression")) - pl <- pl + ggplot2::scale_colour_manual(name="expression source",values=colors) - pl <- pl + ggplot2::facet_wrap(~LR_comb, scales = facet_scales) - pl <- pl + ggplot2::labs(x = 'gene-gene', y = 'interactions') - } - } - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + simple_plot_facet <- match.arg( + arg = simple_plot_facet, choices = c("interaction", "genes")) + + if (simple_plot_facet == "interaction") { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = LR_expr, xend = LR_expr_spat, + y = LR_comb, yend = LR_comb + ), linetype = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = LR_expr, y = LR_comb, + colour = "overall cell expression")) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = LR_expr_spat, y = LR_comb, + colour = "spatial cell expression")) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = colors) + pl <- pl + ggplot2::facet_wrap(~LR_cell_comb, scales = "fixed") + pl <- pl + ggplot2::labs(x = "interactions", y = "gene-gene") + pl + } else { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_bw() + pl <- pl + ggplot2::geom_segment(data = subDT, aes( + x = LR_expr, xend = LR_expr_spat, + y = LR_cell_comb, yend = LR_cell_comb + ), linetype = 2) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = LR_expr, y = LR_cell_comb, + colour = "overall cell expression")) + pl <- pl + ggplot2::geom_point( + data = subDT, + aes(x = LR_expr_spat, y = LR_cell_comb, + colour = "spatial cell expression")) + pl <- pl + ggplot2::scale_colour_manual( + name = "expression source", values = colors) + pl <- pl + ggplot2::facet_wrap(~LR_comb, scales = facet_scales) + pl <- pl + ggplot2::labs(x = "gene-gene", y = "interactions") + } + } + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title plotCombineCCcom #' @name plotCombineCCcom -#' @description Create visualization for combined (pairwise) cell proximity gene scores +#' @description Create visualization for combined (pairwise) cell proximity +#' gene scores #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @param combCCcom combined communcation scores, output from combCCcom() #' @param selected_LR selected ligand-receptor pair -#' @param selected_cell_LR selected cell-cell interaction pair for ligand-receptor pair +#' @param selected_cell_LR selected cell-cell interaction pair for +#' ligand-receptor pair #' @param detail_plot show detailed info in both interacting cell types #' @param simple_plot show a simplified plot #' @param simple_plot_facet facet on interactions or genes with simple plot @@ -1898,54 +2219,56 @@ plotCombineCellCellCommunication <- function(gobject, #' @param colors vector with two colors to use #' @return ggplot #' @export -plotCombineCCcom = function(gobject, - combCCcom, - selected_LR = NULL, - selected_cell_LR = NULL, - detail_plot = T, - simple_plot = F, - simple_plot_facet = c('interaction', 'genes'), - facet_scales = 'fixed', - facet_ncol = length(selected_LR), - facet_nrow = length(selected_cell_LR), - colors = c('#9932CC', '#FF8C00'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCombineCCcom') { - - - plotCombineCellCellCommunication(gobject = gobject, - combCCcom = combCCcom, - selected_LR = selected_LR, - selected_cell_LR = selected_cell_LR, - detail_plot = detail_plot, - simple_plot = simple_plot, - simple_plot_facet = simple_plot_facet, - facet_scales = facet_scales, - facet_ncol = facet_ncol, - facet_nrow = facet_nrow, - colors = colors, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot, - save_param = save_param, - default_save_name = default_save_name) - +plotCombineCCcom <- function(gobject, + combCCcom, + selected_LR = NULL, + selected_cell_LR = NULL, + detail_plot = TRUE, + simple_plot = FALSE, + simple_plot_facet = c("interaction", "genes"), + facet_scales = "fixed", + facet_ncol = length(selected_LR), + facet_nrow = length(selected_cell_LR), + colors = c("#9932CC", "#FF8C00"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCombineCCcom") { + plotCombineCellCellCommunication( + gobject = gobject, + combCCcom = combCCcom, + selected_LR = selected_LR, + selected_cell_LR = selected_cell_LR, + detail_plot = detail_plot, + simple_plot = simple_plot, + simple_plot_facet = simple_plot_facet, + facet_scales = facet_scales, + facet_ncol = facet_ncol, + facet_nrow = facet_nrow, + colors = colors, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot, + save_param = save_param, + default_save_name = default_save_name + ) } #' @title plotCCcomHeatmap #' @name plotCCcomHeatmap -#' @description Plots heatmap for ligand-receptor communication scores in cell-cell interactions +#' @description Plots heatmap for ligand-receptor communication scores in +#' cell-cell interactions #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @inheritParams plot_params -#' @param comScores communinication scores from \code{\link{exprCellCellcom}} or \code{\link{spatCellCellcom}} +#' @param comScores communinication scores from \code{\link{exprCellCellcom}} +#' or \code{\link{spatCellCellcom}} #' @param selected_LR selected ligand-receptor combinations -#' @param selected_cell_LR selected cell-cell combinations for ligand-receptor combinations +#' @param selected_cell_LR selected cell-cell combinations for ligand-receptor +#' combinations #' @param show_LR_names show ligand-receptor names #' @param show_cell_LR_names show cell-cell names #' @param show values to show on heatmap @@ -1953,232 +2276,277 @@ plotCombineCCcom = function(gobject, #' @param aggl_method agglomeration method used by hclust #' @return ggplot #' @export -plotCCcomHeatmap = function(gobject, - comScores, - selected_LR = NULL, - selected_cell_LR = NULL, - show_LR_names = TRUE, - show_cell_LR_names = TRUE, - show = c('PI', 'LR_expr', 'log2fc'), - cor_method = c("pearson", "kendall", "spearman"), - aggl_method = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), - gradient_color = NULL, - gradient_style = c('divergent', 'sequential'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCCcomHeatmap') { - - - # get parameters - cor_method = match.arg(cor_method, choices = c("pearson", "kendall", "spearman")) - aggl_method = match.arg(aggl_method, choices = c("ward.D", "ward.D2", "single", "complete", - "average", "mcquitty", "median", "centroid")) - - - # data.table variables - LR_comb = LR_cell_comb = NULL - - # plot method - if(!is.null(selected_LR) & !is.null(selected_cell_LR)) { - selDT = comScores[LR_comb %in% selected_LR & LR_cell_comb %in% selected_cell_LR] - } else if(!is.null(selected_LR)) { - selDT = comScores[LR_comb %in% selected_LR] - } else if(!is.null(selected_cell_LR)) { - selDT = comScores[LR_cell_comb %in% selected_cell_LR] - } else { - selDT = comScores - } - - # creat matrix - show = match.arg(show, choices = c('PI', 'LR_expr', 'log2fc')) - selDT_d = data.table::dcast.data.table(selDT, LR_cell_comb~LR_comb, value.var = show, fill = 0) - selDT_m = dt_to_matrix(selDT_d) - - ## cells - corclus_cells_dist = stats::as.dist(1-cor_flex(x = t_flex(selDT_m), method = cor_method)) - hclusters_cells = stats::hclust(d = corclus_cells_dist, method = aggl_method) - clus_names = rownames(selDT_m) - names(clus_names) = seq_along(clus_names) - clus_sort_names = clus_names[hclusters_cells$order] - selDT[, LR_cell_comb := factor(LR_cell_comb, clus_sort_names)] - - ## genes - corclus_genes_dist = stats::as.dist(1-cor_flex(x = selDT_m, method = cor_method)) - hclusters_genes = stats::hclust(d = corclus_genes_dist, method = aggl_method) - clus_names = colnames(selDT_m) - names(clus_names) = seq_along(clus_names) - clus_sort_names = clus_names[hclusters_genes$order] - selDT[, LR_comb := factor(LR_comb, clus_sort_names)] - - - - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_raster(data = selDT, aes_string(x = 'LR_cell_comb', - y = 'LR_comb', fill = show)) - - pl = pl + ggplot2::theme_classic() + ggplot2::theme(axis.text.x = element_blank(), - axis.ticks = element_blank(), - axis.text.y = element_blank()) - if(show_LR_names == TRUE) pl <- pl + ggplot2::theme(axis.text.y = element_text(), - axis.ticks.y = element_line()) - if(show_cell_LR_names == TRUE) pl <- pl + ggplot2::theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust = 1), - axis.ticks.x = element_line()) - pl = pl + set_default_color_continuous_CCcom_heatmap( - colors = gradient_color, - instrs = instructions(gobject), - style = gradient_style - ) - pl = pl + ggplot2::labs(x = 'cell-cell', y = 'ligand-receptor') - - return(plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +plotCCcomHeatmap <- function(gobject, + comScores, + selected_LR = NULL, + selected_cell_LR = NULL, + show_LR_names = TRUE, + show_cell_LR_names = TRUE, + show = c("PI", "LR_expr", "log2fc"), + cor_method = c("pearson", "kendall", "spearman"), + aggl_method = c("ward.D", "ward.D2", "single", "complete", "average", + "mcquitty", "median", "centroid"), + gradient_color = NULL, + gradient_style = c("divergent", "sequential"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCCcomHeatmap") { + # get parameters + cor_method <- match.arg( + cor_method, choices = c("pearson", "kendall", "spearman")) + aggl_method <- match.arg(aggl_method, choices = c( + "ward.D", "ward.D2", "single", "complete", + "average", "mcquitty", "median", "centroid" + )) + + + # data.table variables + LR_comb <- LR_cell_comb <- NULL + + # plot method + if (!is.null(selected_LR) & !is.null(selected_cell_LR)) { + selDT <- comScores[LR_comb %in% selected_LR & LR_cell_comb %in% + selected_cell_LR] + } else if (!is.null(selected_LR)) { + selDT <- comScores[LR_comb %in% selected_LR] + } else if (!is.null(selected_cell_LR)) { + selDT <- comScores[LR_cell_comb %in% selected_cell_LR] + } else { + selDT <- comScores + } + + # creat matrix + show <- match.arg(show, choices = c("PI", "LR_expr", "log2fc")) + selDT_d <- data.table::dcast.data.table( + selDT, LR_cell_comb ~ LR_comb, value.var = show, fill = 0) + selDT_m <- dt_to_matrix(selDT_d) + + ## cells + corclus_cells_dist <- stats::as.dist( + 1 - cor_flex(x = t_flex(selDT_m), method = cor_method)) + hclusters_cells <- stats::hclust( + d = corclus_cells_dist, method = aggl_method) + clus_names <- rownames(selDT_m) + names(clus_names) <- seq_along(clus_names) + clus_sort_names <- clus_names[hclusters_cells$order] + selDT[, LR_cell_comb := factor(LR_cell_comb, clus_sort_names)] + + ## genes + corclus_genes_dist <- stats::as.dist( + 1 - cor_flex(x = selDT_m, method = cor_method)) + hclusters_genes <- stats::hclust( + d = corclus_genes_dist, method = aggl_method) + clus_names <- colnames(selDT_m) + names(clus_names) <- seq_along(clus_names) + clus_sort_names <- clus_names[hclusters_genes$order] + selDT[, LR_comb := factor(LR_comb, clus_sort_names)] + + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_raster(data = selDT, aes_string( + x = "LR_cell_comb", + y = "LR_comb", fill = show + )) + + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text.x = element_blank(), + axis.ticks = element_blank(), + axis.text.y = element_blank() + ) + if (show_LR_names == TRUE) { + pl <- pl + ggplot2::theme( + axis.text.y = element_text(), + axis.ticks.y = element_line() + ) + } + if (show_cell_LR_names == TRUE) { + pl <- pl + ggplot2::theme( + axis.text.x = element_text(angle = 90, vjust = 1, hjust = 1), + axis.ticks.x = element_line() + ) + } + pl <- pl + set_default_color_continuous_CCcom_heatmap( + colors = gradient_color, + instrs = instructions(gobject), + style = gradient_style + ) + pl <- pl + ggplot2::labs(x = "cell-cell", y = "ligand-receptor") + + return(plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title plotCCcomDotplot #' @name plotCCcomDotplot -#' @description Plots dotplot for ligand-receptor communication scores in cell-cell interactions +#' @description Plots dotplot for ligand-receptor communication scores in +#' cell-cell interactions #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @inheritParams plot_params -#' @param comScores communication scores from \code{\link{exprCellCellcom}} or \code{\link{spatCellCellcom}} +#' @param comScores communication scores from \code{\link{exprCellCellcom}} +#' or \code{\link{spatCellCellcom}} #' @param selected_LR selected ligand-receptor combinations -#' @param selected_cell_LR selected cell-cell combinations for ligand-receptor combinations +#' @param selected_cell_LR selected cell-cell combinations for ligand-receptor +#' combinations #' @param show_LR_names show ligand-receptor names #' @param show_cell_LR_names show cell-cell names -#' @param cluster_on values to use for clustering of cell-cell and ligand-receptor pairs +#' @param cluster_on values to use for clustering of cell-cell and +#' ligand-receptor pairs #' @param cor_method correlation method used for clustering #' @param aggl_method agglomeration method used by hclust #' @param dot_color_gradient character. continuous colors to use. palette to #' use or vector of colors to use (minimum of 2). #' @return ggplot #' @export -plotCCcomDotplot = function(gobject, - comScores, - selected_LR = NULL, - selected_cell_LR = NULL, - show_LR_names = TRUE, - show_cell_LR_names = TRUE, - cluster_on = c('PI', 'LR_expr', 'log2fc'), - cor_method = c("pearson", "kendall", "spearman"), - aggl_method = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), - dot_color_gradient = NULL, - gradient_style = c('divergent', 'sequential'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotCCcomDotplot') { - - # get parameters - cor_method = match.arg(cor_method, choices = c("pearson", "kendall", "spearman")) - aggl_method = match.arg(aggl_method, choices = c("ward.D", "ward.D2", "single", "complete", - "average", "mcquitty", "median", "centroid")) - - - # data.table variables - LR_comb = LR_cell_comb = sd = NULL - - # plot method - if(!is.null(selected_LR) & !is.null(selected_cell_LR)) { - selDT = comScores[LR_comb %in% selected_LR & LR_cell_comb %in% selected_cell_LR] - } else if(!is.null(selected_LR)) { - selDT = comScores[LR_comb %in% selected_LR] - } else if(!is.null(selected_cell_LR)) { - selDT = comScores[LR_cell_comb %in% selected_cell_LR] - } else { - selDT = comScores - } - - # creat matrix - cluster_on = match.arg(cluster_on, choices = c('PI', 'LR_expr', 'log2fc')) - selDT_d = data.table::dcast.data.table(selDT, LR_cell_comb~LR_comb, value.var = cluster_on, fill = 0) - selDT_m = dt_to_matrix(selDT_d) - - # remove zero variance - sd_rows = apply(selDT_m, 1, sd) - sd_rows_zero = names(sd_rows[sd_rows == 0]) - if(length(sd_rows_zero) > 0) selDT_m = selDT_m[!rownames(selDT_m) %in% sd_rows_zero, ] - - sd_cols = apply(selDT_m, 2, sd) - sd_cols_zero = names(sd_cols[sd_cols == 0]) - if(length(sd_cols_zero) > 0) selDT_m = selDT_m[, !colnames(selDT_m) %in% sd_cols_zero] - - - - ## cells - corclus_cells_dist = stats::as.dist(1-cor_flex(x = t_flex(selDT_m), method = cor_method)) - hclusters_cells = stats::hclust(d = corclus_cells_dist, method = aggl_method) - clus_names = rownames(selDT_m) - names(clus_names) = seq_along(clus_names) - clus_sort_names = clus_names[hclusters_cells$order] - selDT[, LR_cell_comb := factor(LR_cell_comb, clus_sort_names)] - - ## genes - corclus_genes_dist = stats::as.dist(1-cor_flex(x = selDT_m, method = cor_method)) - hclusters_genes = stats::hclust(d = corclus_genes_dist, method = aggl_method) - clus_names = colnames(selDT_m) - names(clus_names) = seq_along(clus_names) - clus_sort_names = clus_names[hclusters_genes$order] - selDT[, LR_comb := factor(LR_comb, clus_sort_names)] - - - - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_point(data = selDT, aes_string(x = 'LR_cell_comb', - y = 'LR_comb', size = 'pvalue', color = 'log2fc')) - pl = pl + ggplot2::theme_classic() - if(show_LR_names == TRUE) pl = pl + ggplot2::theme(axis.text.y = element_text(), - axis.ticks.y = element_line()) - if(show_cell_LR_names == TRUE) pl = pl + ggplot2::theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust = 1), - axis.ticks.x = element_line()) - pl = pl + set_default_color_continuous_CCcom_dotplot( - colors = dot_color_gradient, - instrs = instructions(gobject), - midpoint = NULL, - style = gradient_style, - type = 'fill' - ) - pl = pl + ggplot2::scale_size_continuous(range = c(5, 0.5)) + - set_default_color_continuous_CCcom_dotplot( - colors = dot_color_gradient, - instrs = instructions(gobject), - midpoint = NULL, - style = gradient_style, - type = 'color' +plotCCcomDotplot <- function(gobject, + comScores, + selected_LR = NULL, + selected_cell_LR = NULL, + show_LR_names = TRUE, + show_cell_LR_names = TRUE, + cluster_on = c("PI", "LR_expr", "log2fc"), + cor_method = c("pearson", "kendall", "spearman"), + aggl_method = c("ward.D", "ward.D2", "single", "complete", "average", + "mcquitty", "median", "centroid"), + dot_color_gradient = NULL, + gradient_style = c("divergent", "sequential"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotCCcomDotplot") { + # get parameters + cor_method <- match.arg( + cor_method, choices = c("pearson", "kendall", "spearman")) + aggl_method <- match.arg(aggl_method, choices = c( + "ward.D", "ward.D2", "single", "complete", + "average", "mcquitty", "median", "centroid" + )) + + + # data.table variables + LR_comb <- LR_cell_comb <- sd <- NULL + + # plot method + if (!is.null(selected_LR) & !is.null(selected_cell_LR)) { + selDT <- comScores[LR_comb %in% selected_LR & LR_cell_comb %in% + selected_cell_LR] + } else if (!is.null(selected_LR)) { + selDT <- comScores[LR_comb %in% selected_LR] + } else if (!is.null(selected_cell_LR)) { + selDT <- comScores[LR_cell_comb %in% selected_cell_LR] + } else { + selDT <- comScores + } + + # creat matrix + cluster_on <- match.arg(cluster_on, choices = c("PI", "LR_expr", "log2fc")) + selDT_d <- data.table::dcast.data.table( + selDT, LR_cell_comb ~ LR_comb, value.var = cluster_on, fill = 0) + selDT_m <- dt_to_matrix(selDT_d) + + # remove zero variance + sd_rows <- apply(selDT_m, 1, sd) + sd_rows_zero <- names(sd_rows[sd_rows == 0]) + if (length(sd_rows_zero) > 0) selDT_m <- selDT_m[ + !rownames(selDT_m) %in% sd_rows_zero, ] + + sd_cols <- apply(selDT_m, 2, sd) + sd_cols_zero <- names(sd_cols[sd_cols == 0]) + if (length(sd_cols_zero) > 0) selDT_m <- selDT_m[ + , !colnames(selDT_m) %in% sd_cols_zero] + + + + ## cells + corclus_cells_dist <- stats::as.dist( + 1 - cor_flex(x = t_flex(selDT_m), method = cor_method)) + hclusters_cells <- stats::hclust( + d = corclus_cells_dist, method = aggl_method) + clus_names <- rownames(selDT_m) + names(clus_names) <- seq_along(clus_names) + clus_sort_names <- clus_names[hclusters_cells$order] + selDT[, LR_cell_comb := factor(LR_cell_comb, clus_sort_names)] + + ## genes + corclus_genes_dist <- stats::as.dist( + 1 - cor_flex(x = selDT_m, method = cor_method)) + hclusters_genes <- stats::hclust( + d = corclus_genes_dist, method = aggl_method) + clus_names <- colnames(selDT_m) + names(clus_names) <- seq_along(clus_names) + clus_sort_names <- clus_names[hclusters_genes$order] + selDT[, LR_comb := factor(LR_comb, clus_sort_names)] + + + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_point(data = selDT, aes_string( + x = "LR_cell_comb", + y = "LR_comb", size = "pvalue", color = "log2fc" + )) + pl <- pl + ggplot2::theme_classic() + if (show_LR_names == TRUE) { + pl <- pl + ggplot2::theme( + axis.text.y = element_text(), + axis.ticks.y = element_line() + ) + } + if (show_cell_LR_names == TRUE) { + pl <- pl + ggplot2::theme( + axis.text.x = element_text(angle = 90, vjust = 1, hjust = 1), + axis.ticks.x = element_line() + ) + } + pl <- pl + set_default_color_continuous_CCcom_dotplot( + colors = dot_color_gradient, + instrs = instructions(gobject), + midpoint = NULL, + style = gradient_style, + type = "fill" ) - pl = pl + ggplot2::labs(x = 'cell-cell', y = 'ligand-receptor') - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + pl <- pl + ggplot2::scale_size_continuous(range = c(5, 0.5)) + + set_default_color_continuous_CCcom_dotplot( + colors = dot_color_gradient, + instrs = instructions(gobject), + midpoint = NULL, + style = gradient_style, + type = "color" + ) + pl <- pl + ggplot2::labs(x = "cell-cell", y = "ligand-receptor") + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title plotRankSpatvsExpr #' @name plotRankSpatvsExpr -#' @description Plots dotplot to compare ligand-receptor rankings from spatial and expression information +#' @description Plots dotplot to compare ligand-receptor rankings from +#' spatial and expression information #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @param combCC combined communication scores from \code{\link{combCCcom}} @@ -2193,156 +2561,163 @@ plotCCcomDotplot = function(gobject, #' @param size_range size ranges of dotplot #' @param xlims x-limits, numerical vector of 2 #' @param ylims y-limits, numerical vector of 2 -#' @param selected_ranks numerical vector, will be used to print out the percentage of top spatial ranks are recovered +#' @param selected_ranks numerical vector, will be used to print out the +#' percentage of top spatial ranks are recovered #' @return ggplot #' @export -plotRankSpatvsExpr = function(gobject, - combCC, - expr_rnk_column = 'LR_expr_rnk', - spat_rnk_column = 'LR_spat_rnk', - dot_color_gradient = NULL, - midpoint = deprecated(), - gradient_midpoint = 10, - gradient_style = c('divergent', 'sequential'), - size_range = c(0.01, 1.5), - xlims = NULL, - ylims = NULL, - selected_ranks = c(1, 10, 20), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotRankSpatvsExpr') { - - # deprecate - if (GiottoUtils::is_present(midpoint)) { - deprecate_warn('4.0.0', - 'Giotto::plotRankSpatvsExpr(midpoint = )', - 'Giotto::plotRankSpatvsExpr(gradient_midpoint = )') - gradient_midpoint <- midpoint - } - - - # data.table variables - spt_rank = variable = value = NULL - - total_rnks = max(unique(combCC[[expr_rnk_column]])) - - rnk_list = list() - spt_list = list() - for(rnk in 1:total_rnks) { - - mytab = table(cut(sort(combCC[get(expr_rnk_column) == rnk][[spat_rnk_column]]), breaks = seq(0,total_rnks,1), labels = c(1:total_rnks))) - rnk_list[[rnk]] = mytab - spt_list[[rnk]] = names(mytab) - } - - rnk_res = data.table::as.data.table(do.call('rbind', rnk_list)) - rnk_res[, spt_rank := 1:total_rnks] - - rnk_res_m = data.table::melt.data.table(rnk_res, id.vars = 'spt_rank') - rnk_res_m[, spt_rank := as.numeric(spt_rank)] - rnk_res_m[, variable := as.numeric(variable)] - - rnk_res_m[, diff := variable - spt_rank] - - for(i in selected_ranks) { - perc_recovered = 100*(sum(rnk_res_m[abs(diff) < i]$value)/sum(rnk_res_m$value)) - cat('for top ', i, ' expression ranks, you recover ', round(perc_recovered, 2), '% of the highest spatial rank \n') - } - - - # full plot - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() + ggplot2::theme(axis.text = element_blank()) - pl = pl + ggplot2::geom_point(data = rnk_res_m, ggplot2::aes(x = variable, y = spt_rank, size = value, color = value)) - pl = pl + set_default_color_continuous_CCcom_dotplot( - colors = dot_color_gradient, - instrs = instructions(gobject), - midpoint = gradient_midpoint, - style = gradient_style, - type = 'color', - guide = guide_legend(title = '') - ) - pl = pl + ggplot2::scale_size_continuous(range = size_range, guide = "none") - pl = pl + ggplot2::labs(x = 'expression rank', y = 'spatial rank') - - if(!is.null(xlims)) { - pl = pl + xlim(xlims) - } - - if(!is.null(ylims)) { - pl = pl + ylim(ylims) - } - - return(plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +plotRankSpatvsExpr <- function(gobject, + combCC, + expr_rnk_column = "LR_expr_rnk", + spat_rnk_column = "LR_spat_rnk", + dot_color_gradient = NULL, + midpoint = deprecated(), + gradient_midpoint = 10, + gradient_style = c("divergent", "sequential"), + size_range = c(0.01, 1.5), + xlims = NULL, + ylims = NULL, + selected_ranks = c(1, 10, 20), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotRankSpatvsExpr") { + # deprecate + if (GiottoUtils::is_present(midpoint)) { + deprecate_warn( + "4.0.0", + "Giotto::plotRankSpatvsExpr(midpoint = )", + "Giotto::plotRankSpatvsExpr(gradient_midpoint = )" + ) + gradient_midpoint <- midpoint + } -} + # data.table variables + spt_rank <- variable <- value <- NULL + total_rnks <- max(unique(combCC[[expr_rnk_column]])) + rnk_list <- list() + spt_list <- list() + for (rnk in 1:total_rnks) { + mytab <- table(cut(sort(combCC[get(expr_rnk_column) == rnk][[ + spat_rnk_column]]), breaks = seq(0, total_rnks, 1), + labels = c(1:total_rnks))) + rnk_list[[rnk]] <- mytab + spt_list[[rnk]] <- names(mytab) + } -#' @title Create recovery plot -#' @name .plotRecovery_sub -#' @description Plots recovery plot to compare ligand-receptor rankings from spatial and expression information -#' @param combCC combined communinication scores from \code{\link{combCCcom}} -#' @param first_col first column to use -#' @param second_col second column to use -#' @keywords internal -.plotRecovery_sub = function(combCC, - first_col = 'LR_expr_rnk', - second_col = 'LR_spat_rnk') { + rnk_res <- data.table::as.data.table(do.call("rbind", rnk_list)) + rnk_res[, spt_rank := 1:total_rnks] + rnk_res_m <- data.table::melt.data.table(rnk_res, id.vars = "spt_rank") + rnk_res_m[, spt_rank := as.numeric(spt_rank)] + rnk_res_m[, variable := as.numeric(variable)] - # data.table variables - concord = perc = not_concord = secondrank = secondrank_perc = NULL + rnk_res_m[, diff := variable - spt_rank] - mergeDT_filt = combCC[get(first_col) == 1] + for (i in selected_ranks) { + perc_recovered <- 100 * (sum(rnk_res_m[abs(diff) < i]$value) / + sum(rnk_res_m$value)) + cat("for top ", i, " expression ranks, you recover ", + round(perc_recovered, 2), "% of the highest spatial rank") + } - mymat = matrix(data = NA, nrow = max(combCC[[second_col]]), ncol = 2) - for(i in 1:max(combCC[[second_col]])) { - mergeDT_filt[, concord := ifelse(get(second_col) <= i, 'yes', 'no')] - mytable = table(mergeDT_filt$concord) + # full plot + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.text = element_blank()) + pl <- pl + ggplot2::geom_point( + data = rnk_res_m, + ggplot2::aes(x = variable, y = spt_rank, size = value, color = value)) + pl <- pl + set_default_color_continuous_CCcom_dotplot( + colors = dot_color_gradient, + instrs = instructions(gobject), + midpoint = gradient_midpoint, + style = gradient_style, + type = "color", + guide = guide_legend(title = "") + ) + pl <- pl + ggplot2::scale_size_continuous( + range = size_range, guide = "none") + pl <- pl + ggplot2::labs(x = "expression rank", y = "spatial rank") - matching = mytable['yes'] - if(is.na(matching)) matching = 0 - mymat[i, 1] = matching + if (!is.null(xlims)) { + pl <- pl + xlim(xlims) + } - non_matching = mytable['no'] - if(is.na(non_matching)) non_matching = 0 - mymat[i, 2] = non_matching + if (!is.null(ylims)) { + pl <- pl + ylim(ylims) + } - } + return(plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) +} - mymatDT = data.table::as.data.table(mymat); colnames(mymatDT) = c('concord', 'not_concord') - mymatDT[, perc := 100*(concord / (concord+not_concord))] - mymatDT[, secondrank := 1:nrow(mymatDT)] - mymatDT[, secondrank_perc := (secondrank/max(secondrank))*100] - # percentage explained - perc_explained = mymatDT[, sum(perc)]/(100*nrow(mymat)) - cat('percentage explained = ', perc_explained) - pl = ggplot2::ggplot() - pl = pl + ggplot2::theme_classic() - pl = pl + ggplot2::geom_point(data = mymatDT, aes(x = secondrank_perc, y = perc)) - pl = pl + ggplot2::scale_x_continuous(expand = c(0,0), limits = c(0,100)) - pl = pl + ggplot2::scale_y_continuous(expand = c(0,0), limits = c(0, 100)) - pl = pl + ggplot2::geom_abline(slope = 1, intercept = 0, color = 'blue') +#' @title Create recovery plot +#' @name .plotRecovery_sub +#' @description Plots recovery plot to compare ligand-receptor rankings from +#' spatial and expression information +#' @param combCC combined communinication scores from \code{\link{combCCcom}} +#' @param first_col first column to use +#' @param second_col second column to use +#' @keywords internal +.plotRecovery_sub <- function(combCC, + first_col = "LR_expr_rnk", + second_col = "LR_spat_rnk") { + # data.table variables + concord <- perc <- not_concord <- secondrank <- secondrank_perc <- NULL + + mergeDT_filt <- combCC[get(first_col) == 1] + + mymat <- matrix(data = NA, nrow = max(combCC[[second_col]]), ncol = 2) + for (i in 1:max(combCC[[second_col]])) { + mergeDT_filt[, concord := ifelse(get(second_col) <= i, "yes", "no")] + mytable <- table(mergeDT_filt$concord) + + matching <- mytable["yes"] + if (is.na(matching)) matching <- 0 + mymat[i, 1] <- matching + + non_matching <- mytable["no"] + if (is.na(non_matching)) non_matching <- 0 + mymat[i, 2] <- non_matching + } + + mymatDT <- data.table::as.data.table(mymat) + colnames(mymatDT) <- c("concord", "not_concord") + mymatDT[, perc := 100 * (concord / (concord + not_concord))] + mymatDT[, secondrank := 1:nrow(mymatDT)] + mymatDT[, secondrank_perc := (secondrank / max(secondrank)) * 100] - return(pl) + # percentage explained + perc_explained <- mymatDT[, sum(perc)] / (100 * nrow(mymat)) + cat("percentage explained = ", perc_explained) + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + pl <- pl + ggplot2::geom_point( + data = mymatDT, + aes(x = secondrank_perc, y = perc)) + pl <- pl + ggplot2::scale_x_continuous(expand = c(0, 0), limits = c(0, 100)) + pl <- pl + ggplot2::scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) + pl <- pl + ggplot2::geom_abline(slope = 1, intercept = 0, color = "blue") + + return(pl) } @@ -2351,7 +2726,8 @@ plotRankSpatvsExpr = function(gobject, #' @title plotRecovery #' @name plotRecovery -#' @description Plots recovery plot to compare ligand-receptor rankings from spatial and expression information +#' @description Plots recovery plot to compare ligand-receptor rankings from +#' spatial and expression information #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @param combCC combined communication scores from \code{\link{combCCcom}} @@ -2360,46 +2736,50 @@ plotRankSpatvsExpr = function(gobject, #' @param ground_truth what to consider as ground truth (default: spatial) #' @return ggplot #' @export -plotRecovery = function(gobject, - combCC, - expr_rnk_column = 'exprPI_rnk', - spat_rnk_column = 'spatPI_rnk', - ground_truth = c('spatial', 'expression'), - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'plotRecovery') { - - ground_truth = match.arg(ground_truth, choices = c('spatial', 'expression')) - - - if(ground_truth == 'spatial') { - - pl = .plotRecovery_sub(combCC = combCC, - first_col = spat_rnk_column, - second_col = expr_rnk_column) - pl = pl + ggplot2::labs(x = '% expression rank included', y = '% highest spatial rank recovered') - - } else if(ground_truth == 'expression') { - - pl = .plotRecovery_sub(combCC = combCC, - first_col = expr_rnk_column, - second_col = spat_rnk_column) - pl = pl + ggplot2::labs(x = '% spatial rank included', y = '% highest expression rank recovered') - - } - - return(plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) +plotRecovery <- function(gobject, + combCC, + expr_rnk_column = "exprPI_rnk", + spat_rnk_column = "spatPI_rnk", + ground_truth = c("spatial", "expression"), + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "plotRecovery") { + ground_truth <- match.arg( + ground_truth, choices = c("spatial", "expression")) + + + if (ground_truth == "spatial") { + pl <- .plotRecovery_sub( + combCC = combCC, + first_col = spat_rnk_column, + second_col = expr_rnk_column + ) + pl <- pl + ggplot2::labs( + x = "% expression rank included", + y = "% highest spatial rank recovered") + } else if (ground_truth == "expression") { + pl <- .plotRecovery_sub( + combCC = combCC, + first_col = expr_rnk_column, + second_col = spat_rnk_column + ) + pl <- pl + ggplot2::labs( + x = "% spatial rank included", + y = "% highest expression rank recovered") + } + + return(plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } @@ -2415,7 +2795,8 @@ plotRecovery = function(gobject, #' @title cellProximitySpatPlot2D #' @name cellProximitySpatPlot2D -#' @description Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +#' @description Visualize 2D cell-cell interactions according to spatial +#' coordinates in ggplot mode #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @inheritParams plot_cell_params @@ -2445,237 +2826,296 @@ plotRecovery = function(gobject, #' @details Description of parameters. #' @export cellProximitySpatPlot2D <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - spat_loc_name = NULL, - interaction_name = NULL, - cluster_column = NULL, - sdimx = 'sdimx', - sdimy = 'sdimy', - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = F, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - coord_fix_ratio = 1, - show_legend = T, - point_size_select = 2, - point_select_border_col = 'black', - point_select_border_stroke = 0.05, - point_size_other = 1, - point_alpha_other = 0.3, - point_other_border_col = 'lightgrey', - point_other_border_stroke = 0.01, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'cellProximitySpatPlot2D') { - if(is.null(interaction_name)) { - stop('\n you need to specific at least one interaction name, run cellProximityEnrichment \n') - } - - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # get information from all slots - cell_locations = get_spatial_locations(gobject = gobject, - spat_unit = spat_unit, - spat_loc_name = spat_loc_name, - output = 'data.table', - copy_obj = FALSE) - - spatial_grid = get_spatialGrid(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - name = spatial_grid_name, - return_grid_Obj = FALSE) - - cell_metadata <- getCellMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'data.table', - copy_obj = TRUE) - - spatial_network = annotateSpatialNetwork(gobject = gobject, - feat_type = feat_type, - spat_unit = spat_unit, - spatial_network_name = spatial_network_name, - cluster_column = cluster_column) - - - # data.table variables - unified_int = sdimx_begin = sdimy_begin = sdimx_end = sdimy_end = x_start = x_end = y_start = y_end = cell_ID = NULL - - cell_IDs_to_keep = unique(c(spatial_network[unified_int %in% interaction_name]$to, - spatial_network[unified_int %in% interaction_name]$from)) - - #print(cell_IDs_to_keep) - - if(show_other_cells){ - CellType <- strsplit(interaction_name,"--") - all_cell_IDs = cell_metadata[cell_metadata[[cluster_column]] == CellType[[1]][1] | - cell_metadata[[cluster_column]] == CellType[[1]][2],]$cell_ID - other_cell_IDs = setdiff(all_cell_IDs, cell_IDs_to_keep) - } - - - # annotated cell data () - if(nrow(cell_metadata) == 0) { - cell_locations_metadata = cell_locations - } else { - cell_locations_metadata = merge(cell_locations, cell_metadata, by = "cell_ID") - } - - - # first 2 dimensions need to be defined - if(is.null(sdimx) | is.null(sdimy)) { - cat('first and second dimenion need to be defined, default is first 2 \n') - sdimx = 'sdimx' - sdimy = 'sdimy' - } - - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() - - if(!is.null(spatial_network) & show_network == TRUE) { - if(is.null(network_color)) network_color = 'red' - if(show_other_network){ - pl <- pl + ggplot2::geom_segment(data = spatial_network[!unified_int %in% interaction_name], - aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, yend = sdimy_end), - color = 'lightgrey', size = 0.5, alpha = 0.5) - } - pl <- pl + ggplot2::geom_segment(data = spatial_network[unified_int %in% interaction_name], - aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, yend = sdimy_end), - color = network_color, size = 0.5, alpha = 0.5) - } - - if(!is.null(spatial_grid) & show_grid == TRUE) { - if(is.null(grid_color)) grid_color = 'black' - pl <- pl + ggplot2::geom_rect(data = spatial_grid, aes(xmin = x_start, xmax = x_end, ymin = y_start, ymax = y_end), - color = grid_color, fill = NA) - } - - # cell color default - if(is.null(cell_color)) { - cell_color = 'lightblue' - pl <- pl + ggplot2::geom_point(data = cell_locations[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = 'lightgrey', size = point_size_other) - pl <- pl + ggplot2::geom_point(data = cell_locations[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, size = point_size_select) - if(show_other_cells){ - pl <- pl + ggplot2::geom_point(data = cell_locations[cell_ID %in% other_cell_IDs], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, alpha = point_alpha_other, - size = point_size_select * 0.5) - } - } - else if (is.character(cell_color)) { - if(cell_color %in% colnames(cell_locations_metadata)) { - - if(color_as_factor == TRUE) { - factor_data = factor(cell_locations_metadata[[cell_color]]) - cell_locations_metadata[[cell_color]] <- factor_data - } - - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - fill = 'lightgrey', shape = 21, size = point_size_other, - color = point_other_border_col, stroke = point_other_border_stroke) - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy, fill = cell_color), - show.legend = show_legend, shape = 21, size = point_size_select, - color = point_select_border_col, stroke = point_select_border_stroke) - if(show_other_cells){ - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% other_cell_IDs], aes_string(x = sdimx, y = sdimy,fill = cell_color), - show.legend = show_legend, shape = 21, alpha = point_alpha_other, - size = point_size_select * 0.5) - } - - - - if(!is.null(cell_color_code)) { - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } else if(color_as_factor == T) { - number_colors = length(unique(factor_data)) - cell_color_code = set_default_color_discrete_cell(instrs = instructions(gobject))(n = number_colors) - names(cell_color_code) = unique(factor_data) - pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) - } else if(color_as_factor == F){ - pl <- pl + set_default_color_continuous_cell( - colors = cell_color_code, - instrs = instructions(gobject), - midpoint = NULL, - style = 'sequential', - type_default = list( - pal = c('blue', 'red') - ) - ) - } + spat_unit = NULL, + feat_type = NULL, + spat_loc_name = NULL, + interaction_name = NULL, + cluster_column = NULL, + sdimx = "sdimx", + sdimy = "sdimy", + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + coord_fix_ratio = 1, + show_legend = TRUE, + point_size_select = 2, + point_select_border_col = "black", + point_select_border_stroke = 0.05, + point_size_other = 1, + point_alpha_other = 0.3, + point_other_border_col = "lightgrey", + point_other_border_stroke = 0.01, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "cellProximitySpatPlot2D") { + if (is.null(interaction_name)) { + stop("you need to specific at least one interaction name, run + cellProximityEnrichment") + } + + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + + # get information from all slots + cell_locations <- getSpatialLocations( + gobject = gobject, + spat_unit = spat_unit, + name = spat_loc_name, + output = "data.table", + copy_obj = FALSE + ) + + spatial_grid <- getSpatialGrid( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + name = spatial_grid_name, + return_grid_Obj = FALSE + ) + + cell_metadata <- getCellMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "data.table", + copy_obj = TRUE + ) + + spatial_network <- annotateSpatialNetwork( + gobject = gobject, + feat_type = feat_type, + spat_unit = spat_unit, + spatial_network_name = spatial_network_name, + cluster_column = cluster_column + ) + + + # data.table variables + unified_int <- sdimx_begin <- sdimy_begin <- sdimx_end <- sdimy_end <- + x_start <- x_end <- y_start <- y_end <- cell_ID <- NULL + + cell_IDs_to_keep <- unique(c( + spatial_network[unified_int %in% interaction_name]$to, + spatial_network[unified_int %in% interaction_name]$from + )) + + if (show_other_cells) { + CellType <- strsplit(interaction_name, "--") + all_cell_IDs <- cell_metadata[cell_metadata[[ + cluster_column]] == CellType[[1]][1] | + cell_metadata[[cluster_column]] == CellType[[1]][2], ]$cell_ID + other_cell_IDs <- setdiff(all_cell_IDs, cell_IDs_to_keep) + } + + + # annotated cell data () + if (nrow(cell_metadata) == 0) { + cell_locations_metadata <- cell_locations } else { - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = 'lightgrey', size = point_size_other, - color = point_other_border_col, stroke = point_other_border_stroke) - pl <- pl + ggplot2::geom_point(data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], aes_string(x = sdimx, y = sdimy), - show.legend = show_legend, shape = 21, fill = cell_color, size = point_size_select, - color = point_select_border_col, stroke = point_select_border_stroke) - } - - } - - pl <- pl + ggplot2::theme_bw() + ggplot2::theme(plot.title = element_text(hjust = 0.5), - legend.title = element_text(size = 10), - legend.text = element_text(size = 10)) - - # fix coord ratio - if(!is.null(coord_fix_ratio)) { - pl <- pl + ggplot2::coord_fixed(ratio = coord_fix_ratio) - } - - pl <- pl + ggplot2::labs(x = 'x coordinates', y = 'y coordinates') - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) + cell_locations_metadata <- merge( + cell_locations, cell_metadata, by = "cell_ID") + } + + + # first 2 dimensions need to be defined + if (is.null(sdimx) | is.null(sdimy)) { + message("first and second dimenion need to be defined, default is + first 2") + sdimx <- "sdimx" + sdimy <- "sdimy" + } + + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + + if (!is.null(spatial_network) & show_network == TRUE) { + if (is.null(network_color)) network_color <- "red" + if (show_other_network) { + pl <- pl + ggplot2::geom_segment( + data = spatial_network[!unified_int %in% interaction_name], + aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, + yend = sdimy_end), + color = "lightgrey", size = 0.5, alpha = 0.5 + ) + } + pl <- pl + ggplot2::geom_segment( + data = spatial_network[unified_int %in% interaction_name], + aes(x = sdimx_begin, y = sdimy_begin, xend = sdimx_end, + yend = sdimy_end), + color = network_color, size = 0.5, alpha = 0.5 + ) + } + + if (!is.null(spatial_grid) & show_grid == TRUE) { + if (is.null(grid_color)) grid_color <- "black" + pl <- pl + ggplot2::geom_rect( + data = spatial_grid, + aes(xmin = x_start, xmax = x_end, ymin = y_start, ymax = y_end), + color = grid_color, fill = NA + ) + } + + # cell color default + if (is.null(cell_color)) { + cell_color <- "lightblue" + pl <- pl + ggplot2::geom_point( + data = cell_locations[!cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = "lightgrey", + size = point_size_other + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + size = point_size_select + ) + if (show_other_cells) { + pl <- pl + ggplot2::geom_point( + data = cell_locations[cell_ID %in% other_cell_IDs], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + alpha = point_alpha_other, + size = point_size_select * 0.5 + ) + } + } else if (is.character(cell_color)) { + if (cell_color %in% colnames(cell_locations_metadata)) { + if (color_as_factor == TRUE) { + factor_data <- factor(cell_locations_metadata[[cell_color]]) + cell_locations_metadata[[cell_color]] <- factor_data + } + + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + fill = "lightgrey", shape = 21, size = point_size_other, + color = point_other_border_col, + stroke = point_other_border_stroke + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy, fill = cell_color), + show.legend = show_legend, shape = 21, size = point_size_select, + color = point_select_border_col, + stroke = point_select_border_stroke + ) + if (show_other_cells) { + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% other_cell_IDs], + aes_string(x = sdimx, y = sdimy, fill = cell_color), + show.legend = show_legend, shape = 21, + alpha = point_alpha_other, + size = point_size_select * 0.5 + ) + } + + + + if (!is.null(cell_color_code)) { + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } else if (color_as_factor == TRIE) { + number_colors <- length(unique(factor_data)) + cell_color_code <- set_default_color_discrete_cell( + instrs = instructions(gobject))(n = number_colors) + names(cell_color_code) <- unique(factor_data) + pl <- pl + ggplot2::scale_fill_manual(values = cell_color_code) + } else if (color_as_factor == FALSE) { + pl <- pl + set_default_color_continuous_cell( + colors = cell_color_code, + instrs = instructions(gobject), + midpoint = NULL, + style = "sequential", + type_default = list( + pal = c("blue", "red") + ) + ) + } + } else { + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[!cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = "lightgrey", + size = point_size_other, + color = point_other_border_col, + stroke = point_other_border_stroke + ) + pl <- pl + ggplot2::geom_point( + data = cell_locations_metadata[cell_ID %in% cell_IDs_to_keep], + aes_string(x = sdimx, y = sdimy), + show.legend = show_legend, shape = 21, fill = cell_color, + size = point_size_select, + color = point_select_border_col, + stroke = point_select_border_stroke + ) + } + } + + pl <- pl + ggplot2::theme_bw() + ggplot2::theme( + plot.title = element_text(hjust = 0.5), + legend.title = element_text(size = 10), + legend.text = element_text(size = 10) + ) + + # fix coord ratio + if (!is.null(coord_fix_ratio)) { + pl <- pl + ggplot2::coord_fixed(ratio = coord_fix_ratio) + } + + pl <- pl + ggplot2::labs(x = "x coordinates", y = "y coordinates") + + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) } #' @title cellProximitySpatPlot #' @name cellProximitySpatPlot -#' @description Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +#' @description Visualize 2D cell-cell interactions according to spatial +#' coordinates in ggplot mode #' @param gobject giotto object #' @inheritDotParams cellProximitySpatPlot2D -gobject #' @return ggplot #' @details Description of parameters. #' @export -#' @seealso \code{\link{cellProximitySpatPlot2D}} and \code{\link{cellProximitySpatPlot3D}} for 3D -cellProximitySpatPlot = function(gobject, ...) { - - cellProximitySpatPlot2D(gobject = gobject, ...) - +#' @seealso \code{\link{cellProximitySpatPlot2D}} and +#' \code{\link{cellProximitySpatPlot3D}} for 3D +cellProximitySpatPlot <- function(gobject, ...) { + cellProximitySpatPlot2D(gobject = gobject, ...) } #' @title cellProximitySpatPlot3D #' @name cellProximitySpatPlot3D -#' @description Visualize 3D cell-cell interactions according to spatial coordinates in plotly mode +#' @description Visualize 3D cell-cell interactions according to spatial +#' coordinates in plotly mode #' @inheritParams data_access_params #' @inheritParams plot_output_params #' @inheritParams plot_spatnet_params @@ -2705,105 +3145,107 @@ cellProximitySpatPlot = function(gobject, ...) { #' @return plotly #' @details Description of parameters. #' @export -cellProximitySpatPlot3D = function(gobject, - interaction_name = NULL, - cluster_column = NULL, - sdimx = "sdimx", - sdimy = "sdimy", - sdimz = "sdimz", - cell_color = NULL, - cell_color_code = NULL, - color_as_factor = TRUE, - show_other_cells = TRUE, - show_network = TRUE, - show_other_network = FALSE, - network_color = NULL, - spatial_network_name = 'Delaunay_network', - show_grid = FALSE, - grid_color = NULL, - spatial_grid_name = 'spatial_grid', - show_legend = TRUE, - point_size_select = 4, - point_size_other = 2, - point_alpha_other = 0.5, - axis_scale = c("cube","real","custom"), - custom_ratio = NULL, - x_ticks = NULL, - y_ticks = NULL, - z_ticks = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'cellProximitySpatPlot3D', - ...) { - if (is.null(sdimz)){ - pl = .cellProximityVisPlot_2D_plotly(gobject = gobject, - interaction_name = interaction_name, - cluster_column = cluster_column, - sdimx = sdimx, - sdimy = sdimy, - cell_color = cell_color, - cell_color_code = cell_color_code, - color_as_factor = color_as_factor, - show_other_cells = show_other_cells, - show_network = show_network, - show_other_network = show_other_network, - network_color = network_color, - spatial_network_name = spatial_network_name, - show_grid = show_grid, - grid_color = grid_color, - spatial_grid_name = spatial_grid_name, - show_legend = show_legend, - point_size_select = point_size_select, - point_size_other = point_size_other, - point_alpha_other = point_alpha_other, - axis_scale = axis_scale, - custom_ratio = custom_ratio, - x_ticks = x_ticks, - y_ticks = y_ticks, - ...) - } - else{ - pl = .cellProximityVisPlot_3D_plotly(gobject = gobject, - interaction_name = interaction_name, - cluster_column = cluster_column, - sdimx = sdimx, - sdimy = sdimy, - sdimz = sdimz, - cell_color = cell_color, - cell_color_code = cell_color_code, - color_as_factor = color_as_factor, - show_other_cells = show_other_cells, - show_network = show_network, - show_other_network = show_other_network, - network_color = network_color, - spatial_network_name = spatial_network_name, - show_grid = show_grid, - grid_color = grid_color, - spatial_grid_name = spatial_grid_name, - show_legend = show_legend, - point_size_select = point_size_select, - point_size_other = point_size_other, - point_alpha_other = point_alpha_other, - axis_scale = axis_scale, - custom_ratio = custom_ratio, - x_ticks = x_ticks, - y_ticks = y_ticks, - z_ticks = z_ticks, - ...) - } - - # output plot - return(GiottoVisuals::plot_output_handler( - gobject = gobject, - plot_object = pl, - save_plot = save_plot, - return_plot = return_plot, - show_plot = show_plot, - default_save_name = default_save_name, - save_param = save_param, - else_return = NULL - )) -} +cellProximitySpatPlot3D <- function(gobject, + interaction_name = NULL, + cluster_column = NULL, + sdimx = "sdimx", + sdimy = "sdimy", + sdimz = "sdimz", + cell_color = NULL, + cell_color_code = NULL, + color_as_factor = TRUE, + show_other_cells = TRUE, + show_network = TRUE, + show_other_network = FALSE, + network_color = NULL, + spatial_network_name = "Delaunay_network", + show_grid = FALSE, + grid_color = NULL, + spatial_grid_name = "spatial_grid", + show_legend = TRUE, + point_size_select = 4, + point_size_other = 2, + point_alpha_other = 0.5, + axis_scale = c("cube", "real", "custom"), + custom_ratio = NULL, + x_ticks = NULL, + y_ticks = NULL, + z_ticks = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "cellProximitySpatPlot3D", + ...) { + if (is.null(sdimz)) { + pl <- .cellProximityVisPlot_2D_plotly( + gobject = gobject, + interaction_name = interaction_name, + cluster_column = cluster_column, + sdimx = sdimx, + sdimy = sdimy, + cell_color = cell_color, + cell_color_code = cell_color_code, + color_as_factor = color_as_factor, + show_other_cells = show_other_cells, + show_network = show_network, + show_other_network = show_other_network, + network_color = network_color, + spatial_network_name = spatial_network_name, + show_grid = show_grid, + grid_color = grid_color, + spatial_grid_name = spatial_grid_name, + show_legend = show_legend, + point_size_select = point_size_select, + point_size_other = point_size_other, + point_alpha_other = point_alpha_other, + axis_scale = axis_scale, + custom_ratio = custom_ratio, + x_ticks = x_ticks, + y_ticks = y_ticks, + ... + ) + } else { + pl <- .cellProximityVisPlot_3D_plotly( + gobject = gobject, + interaction_name = interaction_name, + cluster_column = cluster_column, + sdimx = sdimx, + sdimy = sdimy, + sdimz = sdimz, + cell_color = cell_color, + cell_color_code = cell_color_code, + color_as_factor = color_as_factor, + show_other_cells = show_other_cells, + show_network = show_network, + show_other_network = show_other_network, + network_color = network_color, + spatial_network_name = spatial_network_name, + show_grid = show_grid, + grid_color = grid_color, + spatial_grid_name = spatial_grid_name, + show_legend = show_legend, + point_size_select = point_size_select, + point_size_other = point_size_other, + point_alpha_other = point_alpha_other, + axis_scale = axis_scale, + custom_ratio = custom_ratio, + x_ticks = x_ticks, + y_ticks = y_ticks, + z_ticks = z_ticks, + ... + ) + } + # output plot + return(GiottoVisuals::plot_output_handler( + gobject = gobject, + plot_object = pl, + save_plot = save_plot, + return_plot = return_plot, + show_plot = show_plot, + default_save_name = default_save_name, + save_param = save_param, + else_return = NULL + )) +} diff --git a/R/spdep.R b/R/spdep.R index a328d4e99..1236aa89b 100644 --- a/R/spdep.R +++ b/R/spdep.R @@ -2,113 +2,124 @@ #' #' @param gobject Input a Giotto object. #' @param method Specify a method name to compute auto correlation. -#' Available methods include \code{"geary.test", "lee.test", "lm.morantest","moran.test"}. +#' Available methods include +#' \code{"geary.test", "lee.test", "lm.morantest","moran.test"}. #' @param spat_unit spatial unit #' @param feat_type feature type #' @param expression_values expression values to use, default = normalized -#' @param spatial_network_to_use spatial network to use, default = spatial_network +#' @param spatial_network_to_use spatial network to use, +#' default = spatial_network #' @param verbose be verbose #' @param return_gobject if FALSE, results are returned as data.table. #' If TRUE, values will be appended to feature metadata #' @return A data table with computed values for each feature. #' @export -spdepAutoCorr <- function (gobject, - method = c("geary.test", "lee.test", "lm.morantest", "moran.test"), - spat_unit = NULL, - feat_type = NULL, - expression_values = "normalized", - spatial_network_to_use = "spatial_network", - return_gobject = FALSE, - verbose = FALSE){ - - # Check and match the specified method argument - method <- match.arg(method) - - # Check gobject and set spat_unit and feat_type - if(!is.null(gobject)) { - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - } - else { - stop('gobject has not been provided\n') - } - - # Evaluate spatial autocorrelation using Giotto - resultSpdepCor <- .evaluate_autocor_input( - gobject = gobject, - use_ext_vals = FALSE, - use_sn = TRUE, - use_expr = TRUE, - use_meta = FALSE, - spat_unit = spat_unit, - feat_type = feat_type, - feats = NULL, - data_to_use = "expression", - expression_values = expression_values, - meta_cols = NULL, - spatial_network_to_use = spatial_network_to_use, - wm_method = "distance", - wm_name = "spat_weights", - node_values = NULL, - weight_matrix = NULL, - verbose = verbose - ) - - - # Extract feats and weight_matrix from the result - feat <- resultSpdepCor$feats - weight_matrix <- resultSpdepCor$weight_matrix - use_values <- resultSpdepCor$use_values - - #progressr - nfeats = length(feat) - step_size = ceiling(nfeats/10L) - - result_list <- list() - progressr::with_progress({ - if(step_size > 1) pb = progressr::progressor(steps = nfeats/step_size) - result_list <- lapply_flex( - seq_along(feat), - future.packages = c('data.table', 'spdep'), - function(feat_value){ - callSpdepVar <- callSpdep( - method = method, - x = use_values[,feat_value], - listw = spdep::mat2listw(weight_matrix, style = "W") +spdepAutoCorr <- function(gobject, + method = c("geary.test", "lee.test", "lm.morantest", "moran.test"), + spat_unit = NULL, + feat_type = NULL, + expression_values = "normalized", + spatial_network_to_use = "spatial_network", + return_gobject = FALSE, + verbose = FALSE) { + # Check and match the specified method argument + method <- match.arg(method) + + # Check gobject and set spat_unit and feat_type + if (!is.null(gobject)) { + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type ) - # Extract the estimated value from the result - result_value <- callSpdepVar$estimate[1] - temp_dt <- data.table(feat_ID = feat[feat_value], value = result_value) - # increment progress - if(exists('pb')) if(feat_value %% step_size == 0) pb() - return(temp_dt) - } + } else { + stop("gobject has not been provided") + } + + # Evaluate spatial autocorrelation using Giotto + resultSpdepCor <- .evaluate_autocor_input( + gobject = gobject, + use_ext_vals = FALSE, + use_sn = TRUE, + use_expr = TRUE, + use_meta = FALSE, + spat_unit = spat_unit, + feat_type = feat_type, + feats = NULL, + data_to_use = "expression", + expression_values = expression_values, + meta_cols = NULL, + spatial_network_to_use = spatial_network_to_use, + wm_method = "distance", + wm_name = "spat_weights", + node_values = NULL, + weight_matrix = NULL, + verbose = verbose ) - }) - # combine results - result_dt <- data.table::rbindlist(result_list) - - # Return the resulting datatable - - if(isTRUE(return_gobject)) { - if(isTRUE(verbose)) wrap_msg('Appending', method, - 'results to feature metadata: fDataDT()') - gobject = addFeatMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = result_dt, - by_column = TRUE, - column_feat_ID = 'feat_ID') - - return(gobject) - } else { - return(result_dt) - } + + # Extract feats and weight_matrix from the result + feat <- resultSpdepCor$feats + weight_matrix <- resultSpdepCor$weight_matrix + use_values <- resultSpdepCor$use_values + + # progressr + nfeats <- length(feat) + step_size <- ceiling(nfeats / 10L) + + result_list <- list() + progressr::with_progress({ + if (step_size > 1) pb <- progressr::progressor( + steps = nfeats / step_size) + result_list <- lapply_flex( + seq_along(feat), + future.packages = c("data.table", "spdep"), + function(feat_value) { + callSpdepVar <- callSpdep( + method = method, + x = use_values[, feat_value], + listw = spdep::mat2listw(weight_matrix, style = "W") + ) + # Extract the estimated value from the result + result_value <- callSpdepVar$estimate[1] + temp_dt <- data.table( + feat_ID = feat[feat_value], value = result_value) + # increment progress + if (exists("pb")) if (feat_value %% step_size == 0) pb() + return(temp_dt) + } + ) + }) + # combine results + result_dt <- data.table::rbindlist(result_list) + + # Return the resulting datatable + + if (isTRUE(return_gobject)) { + if (isTRUE(verbose)) { + wrap_msg( + "Appending", method, + "results to feature metadata: fDataDT()" + ) + } + gobject <- addFeatMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = result_dt, + by_column = TRUE, + column_feat_ID = "feat_ID" + ) + + return(gobject) + } else { + return(result_dt) + } } @@ -117,80 +128,84 @@ spdepAutoCorr <- function (gobject, #' @param method Specify method name to call from spdep with its required #' parameters. #' @param ... Additional parameters for the function. See spdep documentation -#'for relevant parameters. +#' for relevant parameters. #' @return Computed statistics from the specified method. #' @export #' @seealso \url{https://cran.r-project.org/web/packages/spdep/index.html} -callSpdep <- function (method, ...) { +callSpdep <- function(method, ...) { + # Load the 'spdep' package if not already installed + package_check(pkg_name = "spdep", repository = "CRAN", optional = FALSE) - # Load the 'spdep' package if not already installed - package_check(pkg_name = "spdep", repository = "CRAN", optional = FALSE) + # Check if 'method' argument is NULL, if so, stop with an error + if (is.null(method)) { + stop("The 'method' argument has not been provided. Please specify a + valid method.") + } - # Check if 'method' argument is NULL, if so, stop with an error - if (is.null(method)){ - stop ("The 'method' argument has not been provided. Please specify a valid method.") - } + # Check if 'method' exists in the 'spdep' package, if not, stop with an + # error + method <- try(eval(get(method, envir = loadNamespace("spdep"))), + silent = TRUE + ) + if (inherits(method, "try-error")) { + stop(paste( + "Invalid method name. Method", method, + "is not available in the spdep package." + )) + } - # Check if 'method' exists in the 'spdep' package, if not, stop with an error - method <- try(eval(get(method, envir = loadNamespace('spdep'))), - silent = TRUE) - if (inherits(method, 'try-error')) { - stop(paste("Invalid method name. Method", method, - "is not available in the spdep package.")) - } + # Fetch the arguments of the 'method' from 'spdep' + allArgs <- args(method) %>% + as.list() %>% + names() - # Fetch the arguments of the 'method' from 'spdep' - allArgs <- args(method) %>% - as.list() %>% - names() + # Capture arguments provided by the user + methodparam <- list(...) - # Capture arguments provided by the user - methodparam <- list (...) + # Check if the user provided the listw argument + if ("listw" %in% names(methodparam)) { + listw_arg <- methodparam$listw - # Check if the user provided the listw argument - if ("listw" %in% names(methodparam)) { - listw_arg <- methodparam$listw + # Check if listw_arg is a matrix + if (is.matrix(listw_arg)) { + # Convert the matrix to a listw object + listw_arg <- spdep::mat2listw(listw_arg, style = "W") + } else if (!inherits(listw_arg, "listw")) { + stop("listw must be either a matrix or a listw object.") + } - # Check if listw_arg is a matrix - if (is.matrix(listw_arg)) { - # Convert the matrix to a listw object - listw_arg <- spdep::mat2listw(listw_arg, style = "W") + # Update the listw argument in methodparam + methodparam$listw <- listw_arg } - else if (!inherits(listw_arg, "listw")) { - stop("listw must be either a matrix or a listw object.") - } - - # Update the listw argument in methodparam - methodparam$listw <- listw_arg - } - # Check if all user-provided arguments are valid - if (all(!(names(methodparam))%in% allArgs)){ - stop("Invalid or missing parameters.") - } - # A vector of specified arguments that trigger 'spW <- spweights.constants()' - requiredArgs <- c("n", "n1", "n2", "n3", "nn", "S0", "S1", "S2") - - # Check if any of the specified arguments are required by the method - if (any(requiredArgs %in% allArgs)) { - # Obtain arguments from 'spweights.constants' - spW <- spdep::spweights.constants(listw = methodparam$listw) - # Combine user-provided arguments and 'spW', checking only against 'feats' value - combinedParams <- append(methodparam, spW) - }else{ - combinedParams <- methodparam - } + # Check if all user-provided arguments are valid + if (all(!(names(methodparam)) %in% allArgs)) { + stop("Invalid or missing parameters.") + } + # A vector of specified arguments that trigger + # 'spW <- spweights.constants()' + requiredArgs <- c("n", "n1", "n2", "n3", "nn", "S0", "S1", "S2") + + # Check if any of the specified arguments are required by the method + if (any(requiredArgs %in% allArgs)) { + # Obtain arguments from 'spweights.constants' + spW <- spdep::spweights.constants(listw = methodparam$listw) + # Combine user-provided arguments and 'spW', checking only against + # 'feats' value + combinedParams <- append(methodparam, spW) + } else { + combinedParams <- methodparam + } - # Identify common parameters between user and 'spdep' - commonParams <- intersect(names(combinedParams), allArgs) + # Identify common parameters between user and 'spdep' + commonParams <- intersect(names(combinedParams), allArgs) - # Create a named list of common parameters - combinedParams <- combinedParams[commonParams] + # Create a named list of common parameters + combinedParams <- combinedParams[commonParams] - # Call the function with its parameters - do.call(method, combinedParams) + # Call the function with its parameters + do.call(method, combinedParams) } - diff --git a/R/suite_reexports.R b/R/suite_reexports.R index b6529e3eb..f4c650eb5 100644 --- a/R/suite_reexports.R +++ b/R/suite_reexports.R @@ -1,4 +1,3 @@ - # GiottoUtils #### #' @export GiottoUtils::`%>%` @@ -586,14 +585,3 @@ GiottoVisuals::spatNetwDistributionsKneighbors GiottoVisuals::subsetSankeySet #' @export GiottoVisuals::violinPlot - - - - - - - - - - - diff --git a/R/variable_genes.R b/R/variable_genes.R index 6a570d7d7..ddf875b19 100644 --- a/R/variable_genes.R +++ b/R/variable_genes.R @@ -1,43 +1,45 @@ +.calc_cov_group_hvf <- function(feat_in_cells_detected, + nr_expression_groups = 20, + zscore_threshold = 1, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL) { + # NSE vars + cov_group_zscore <- cov <- selected <- mean_expr <- NULL + + steps <- 1 / nr_expression_groups + prob_sequence <- seq(0, 1, steps) + prob_sequence[length(prob_sequence)] <- 1 + expr_group_breaks <- stats::quantile( + feat_in_cells_detected$mean_expr, probs = prob_sequence) + + ## remove zero's from cuts if there are too many and make first group zero + if (any(duplicated(expr_group_breaks))) { + m_expr_vector <- feat_in_cells_detected$mean_expr + expr_group_breaks <- stats::quantile( + m_expr_vector[m_expr_vector > 0], probs = prob_sequence) + expr_group_breaks[[1]] <- 0 + } + expr_groups <- cut( + x = feat_in_cells_detected$mean_expr, + breaks = expr_group_breaks, + labels = paste0("group_", 1:nr_expression_groups), + include.lowest = TRUE + ) + feat_in_cells_detected[, expr_groups := expr_groups] + feat_in_cells_detected[, cov_group_zscore := scale(cov), by = expr_groups] + feat_in_cells_detected[, selected := ifelse( + cov_group_zscore > zscore_threshold, "yes", "no")] -.calc_cov_group_hvf = function(feat_in_cells_detected, - nr_expression_groups = 20, - zscore_threshold = 1, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL) { - - # NSE vars - cov_group_zscore <- cov <- selected <- mean_expr <- NULL - - steps = 1/nr_expression_groups - prob_sequence = seq(0, 1, steps) - prob_sequence[length(prob_sequence)] = 1 - expr_group_breaks = stats::quantile(feat_in_cells_detected$mean_expr, probs = prob_sequence) - - ## remove zero's from cuts if there are too many and make first group zero - if(any(duplicated(expr_group_breaks))) { - m_expr_vector = feat_in_cells_detected$mean_expr - expr_group_breaks = stats::quantile(m_expr_vector[m_expr_vector > 0], probs = prob_sequence) - expr_group_breaks[[1]] = 0 - } - - expr_groups = cut(x = feat_in_cells_detected$mean_expr, - breaks = expr_group_breaks, - labels = paste0('group_', 1:nr_expression_groups), - include.lowest = TRUE) - feat_in_cells_detected[, expr_groups := expr_groups] - feat_in_cells_detected[, cov_group_zscore := scale(cov), by = expr_groups] - feat_in_cells_detected[, selected := ifelse(cov_group_zscore > zscore_threshold, 'yes', 'no')] - - if(any(isTRUE(show_plot), isTRUE(return_plot), isTRUE(save_plot))) { - pl = .create_cov_group_hvf_plot(feat_in_cells_detected, nr_expression_groups) - - return(list(dt = feat_in_cells_detected, pl = pl)) - } else { - return(list(dt = feat_in_cells_detected)) - } + if (any(isTRUE(show_plot), isTRUE(return_plot), isTRUE(save_plot))) { + pl <- .create_cov_group_hvf_plot( + feat_in_cells_detected, nr_expression_groups) + return(list(dt = feat_in_cells_detected, pl = pl)) + } else { + return(list(dt = feat_in_cells_detected)) + } } @@ -46,164 +48,163 @@ -.calc_cov_loess_hvf = function(feat_in_cells_detected, - difference_in_cov = 0.1, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL) { +.calc_cov_loess_hvf <- function(feat_in_cells_detected, + difference_in_cov = 0.1, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL) { + # NSE vars + cov_diff <- pred_cov_feats <- selected <- NULL - # NSE vars - cov_diff <- pred_cov_feats <- selected <- NULL + # create loess regression + loess_formula <- paste0("cov~log(mean_expr)") + var_col <- "cov" - # create loess regression - loess_formula = paste0('cov~log(mean_expr)') - var_col <- 'cov' + loess_model_sample <- stats::loess( + loess_formula, data = feat_in_cells_detected) + feat_in_cells_detected$pred_cov_feats <- stats::predict( + loess_model_sample, newdata = feat_in_cells_detected) + feat_in_cells_detected[, cov_diff := get(var_col) - pred_cov_feats, + by = 1:nrow(feat_in_cells_detected)] + data.table::setorder(feat_in_cells_detected, -cov_diff) + feat_in_cells_detected[, selected := ifelse( + cov_diff > difference_in_cov, "yes", "no")] - loess_model_sample = stats::loess(loess_formula, data = feat_in_cells_detected) - feat_in_cells_detected$pred_cov_feats = stats::predict(loess_model_sample, newdata = feat_in_cells_detected) - feat_in_cells_detected[, cov_diff := get(var_col)-pred_cov_feats, by = 1:nrow(feat_in_cells_detected)] - data.table::setorder(feat_in_cells_detected, -cov_diff) - feat_in_cells_detected[, selected := ifelse(cov_diff > difference_in_cov, 'yes', 'no')] + if (any(isTRUE(show_plot), isTRUE(return_plot), isTRUE(save_plot))) { + pl <- .create_cov_loess_hvf_plot( + feat_in_cells_detected, difference_in_cov, var_col) - if(any(isTRUE(show_plot), isTRUE(return_plot), isTRUE(save_plot))) { - pl = .create_cov_loess_hvf_plot(feat_in_cells_detected, difference_in_cov, var_col) - - return(list(dt = feat_in_cells_detected, pl = pl)) - } else { - return(list(dt = feat_in_cells_detected)) - } + return(list(dt = feat_in_cells_detected, pl = pl)) + } else { + return(list(dt = feat_in_cells_detected)) + } } -.calc_var_hvf = function(scaled_matrix, - var_threshold = 1.5, - var_number = NULL, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - use_parallel = FALSE) { - - # NSE vars - var <- selected <- NULL - - if (isTRUE(use_parallel)) { - test <- apply(X = scaled_matrix, MARGIN = 1, FUN = function(x) var(x)) - } else { - test <- future.apply::future_apply( - X = scaled_matrix, MARGIN = 1, FUN = function(x) var(x), future.seed = TRUE - ) - } - - test = sort(test, decreasing = TRUE) - - dt_res = data.table::data.table(feats = names(test), var = test) - - if(!is.null(var_number) & is.numeric(var_number)) { - dt_res[, selected := 1:.N] - dt_res[, selected := ifelse(selected <= var_number, 'yes', 'no')] - } else { - dt_res[, selected := ifelse(var >= var_threshold, 'yes', 'no')] - } +.calc_var_hvf <- function(scaled_matrix, + var_threshold = 1.5, + var_number = NULL, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + use_parallel = FALSE) { + # NSE vars + var <- selected <- NULL + if (isTRUE(use_parallel)) { + test <- apply(X = scaled_matrix, MARGIN = 1, FUN = function(x) var(x)) + } else { + test <- future.apply::future_apply( + X = scaled_matrix, MARGIN = 1, FUN = function(x) var(x), + future.seed = TRUE + ) + } - if(isTRUE(show_plot) || - isTRUE(return_plot) || - isTRUE(save_plot)) { + test <- sort(test, decreasing = TRUE) - dt_res[, rank := 1:.N] - pl <- .create_calc_var_hvf_plot(dt_res) + dt_res <- data.table::data.table(feats = names(test), var = test) + if (!is.null(var_number) & is.numeric(var_number)) { + dt_res[, selected := 1:.N] + dt_res[, selected := ifelse(selected <= var_number, "yes", "no")] + } else { + dt_res[, selected := ifelse(var >= var_threshold, "yes", "no")] + } - dt_res_final = data.table::copy(dt_res) - dt_res_final[, rank := NULL] - return(list(dt = dt_res_final, pl = pl)) + if (isTRUE(show_plot) || + isTRUE(return_plot) || + isTRUE(save_plot)) { + dt_res[, rank := 1:.N] + pl <- .create_calc_var_hvf_plot(dt_res) - } else { - return(list(dt = dt_res)) - } + dt_res_final <- data.table::copy(dt_res) + dt_res_final[, rank := NULL] + return(list(dt = dt_res_final, pl = pl)) + } else { + return(list(dt = dt_res)) + } } .calc_expr_general_stats <- function(expr_values, expression_threshold) { - # NSE vars - gini <- NULL - - ## create data.table with relevant statistics ## - feat_in_cells_detected <- data.table::data.table( - feats = rownames(expr_values), - nr_cells = rowSums_flex(expr_values > expression_threshold), - total_expr = rowSums_flex(expr_values), - mean_expr = rowMeans_flex(expr_values), - sd = unlist(apply(expr_values, 1, sd)) - ) - - # calculate gini rowwise - gini_level <- unlist(apply(expr_values, MARGIN = 1, mygini_fun)) - feat_in_cells_detected[, gini := gini_level] - - return(feat_in_cells_detected) + # NSE vars + gini <- NULL + + ## create data.table with relevant statistics ## + feat_in_cells_detected <- data.table::data.table( + feats = rownames(expr_values), + nr_cells = rowSums_flex(expr_values > expression_threshold), + total_expr = rowSums_flex(expr_values), + mean_expr = rowMeans_flex(expr_values), + sd = unlist(apply(expr_values, 1, sd)) + ) + + # calculate gini rowwise + gini_level <- unlist(apply(expr_values, MARGIN = 1, mygini_fun)) + feat_in_cells_detected[, gini := gini_level] + + return(feat_in_cells_detected) } .calc_expr_cov_stats <- function(expr_values, expression_threshold) { + # NSE vars + cov <- sd <- mean_expr <- NULL - # NSE vars - cov <- sd <- mean_expr <- NULL - - # get general expression statistics and gini data.table - feat_in_cells_detected <- .calc_expr_general_stats( - expr_values, expression_threshold - ) + # get general expression statistics and gini data.table + feat_in_cells_detected <- .calc_expr_general_stats( + expr_values, expression_threshold + ) - # calculate cov using sd and mean_expr from general stats DT - feat_in_cells_detected[, cov := (sd/mean_expr)] + # calculate cov using sd and mean_expr from general stats DT + feat_in_cells_detected[, cov := (sd / mean_expr)] - return(feat_in_cells_detected) + return(feat_in_cells_detected) } .calc_expr_cov_stats_parallel <- function( - expr_values, - expression_threshold, - cores = GiottoUtils::determine_cores() -) { - - # NSE vars - cov <- sd <- mean_expr <- NULL - - # setup chunk rows to use for each parallel based on number of cores - chunk_rows <- seq(nrow(expr_values)) %>% - split(., cut(., cores)) - - # params to pass into the future_lapply - fparams <- list( - calc_fun = .calc_expr_general_stats, - expression_threshold = expression_threshold - ) - - # parallelized calculation of general stats - chunk_stats_dt_list <- lapply_flex( - chunk_rows, - function(r_idx, fparams) { - fparams$calc_fun(expr_values = expr_values[r_idx,], - expression_threshold = fparams$expression_threshold) - }, - fparams = fparams, - cores = cores, - future.seed = TRUE - ) - - # combine stats tables - feat_in_cells_detected <- data.table::rbindlist(chunk_stats_dt_list) - - # calculate cov using sd and mean_expr from combined general stats DT - feat_in_cells_detected[, cov := (sd/mean_expr)] - - return(feat_in_cells_detected) + expr_values, + expression_threshold, + cores = GiottoUtils::determine_cores()) { + # NSE vars + cov <- sd <- mean_expr <- NULL + + # setup chunk rows to use for each parallel based on number of cores + chunk_rows <- seq(nrow(expr_values)) %>% + split(., cut(., cores)) + + # params to pass into the future_lapply + fparams <- list( + calc_fun = .calc_expr_general_stats, + expression_threshold = expression_threshold + ) + + # parallelized calculation of general stats + chunk_stats_dt_list <- lapply_flex( + chunk_rows, + function(r_idx, fparams) { + fparams$calc_fun( + expr_values = expr_values[r_idx, ], + expression_threshold = fparams$expression_threshold + ) + }, + fparams = fparams, + cores = cores, + future.seed = TRUE + ) + + # combine stats tables + feat_in_cells_detected <- data.table::rbindlist(chunk_stats_dt_list) + + # calculate cov using sd and mean_expr from combined general stats DT + feat_in_cells_detected[, cov := (sd / mean_expr)] + + return(feat_in_cells_detected) } @@ -218,229 +219,263 @@ #' @param feat_type feature type #' @param expression_values expression values to use #' @param method method to calculate highly variable features -#' @param reverse_log_scale reverse log-scale of expression values (default = FALSE) +#' @param reverse_log_scale reverse log-scale of expression values +#' (default = FALSE) #' @param logbase if `reverse_log_scale` is TRUE, which log base was used? #' @param expression_threshold expression threshold to consider a gene detected -#' @param nr_expression_groups (cov_groups) number of expression groups for cov_groups +#' @param nr_expression_groups (cov_groups) number of expression groups for +#' cov_groups #' @param zscore_threshold (cov_groups) zscore to select hvg for cov_groups #' @param HVFname name for highly variable features in cell metadata -#' @param difference_in_cov (cov_loess) minimum difference in coefficient of variance required -#' @param var_threshold (var_p_resid) variance threshold for features for var_p_resid method -#' @param var_number (var_p_resid) number of top variance features for var_p_resid method -#' @param random_subset random subset to perform HVF detection on. Passing `NULL` -#' runs HVF on all cells. +#' @param difference_in_cov (cov_loess) minimum difference in coefficient of +#' variance required +#' @param var_threshold (var_p_resid) variance threshold for features for +#' var_p_resid method +#' @param var_number (var_p_resid) number of top variance features for +#' var_p_resid method +#' @param random_subset random subset to perform HVF detection on. +#' Passing `NULL` runs HVF on all cells. #' @param set_seed logical. whether to set a seed when random_subset is used #' @param seed_number seed number to use when random_subset is used #' @param show_plot show plot #' @param return_plot return ggplot object (overridden by `return_gobject`) #' @param save_plot logical. directly save the plot -#' @param save_param list of saving parameters from [GiottoVisuals::all_plots_save_function()] -#' @param default_save_name default save name for saving, don't change, change save_name in save_param +#' @param save_param list of saving parameters from +#' [GiottoVisuals::all_plots_save_function()] +#' @param default_save_name default save name for saving, don't change, change +#' save_name in save_param #' @param return_gobject boolean: return giotto object (default = TRUE) -#' @return giotto object highly variable features appended to feature metadata (`fDataDT()`) +#' @return giotto object highly variable features appended to feature metadata +#' (`fDataDT()`) #' @details #' Currently we provide 2 ways to calculate highly variable genes: #' #' \strong{1. high coeff of variance (COV) within groups: } \cr -#' First genes are binned (\emph{nr_expression_groups}) into average expression groups and -#' the COV for each feature is converted into a z-score within each bin. Features with a z-score -#' higher than the threshold (\emph{zscore_threshold}) are considered highly variable. \cr +#' First genes are binned (\emph{nr_expression_groups}) into average expression +#' groups and the COV for each feature is converted into a z-score within each +#' bin. Features with a z-score higher than the threshold +#' (\emph{zscore_threshold}) are considered highly variable. \cr #' #' \strong{2. high COV based on loess regression prediction: } \cr -#' A predicted COV is calculated for each feature using loess regression (COV~log(mean expression)) -#' Features that show a higher than predicted COV (\emph{difference_in_cov}) are considered highly variable. \cr +#' A predicted COV is calculated for each feature using loess regression +#' (COV~log(mean expression)) +#' Features that show a higher than predicted COV (\emph{difference_in_cov}) +#' are considered highly variable. \cr #' #' @md #' @export calculateHVF <- function(gobject, - spat_unit = NULL, - feat_type = NULL, - expression_values = c('normalized', 'scaled', 'custom'), - method = c('cov_groups','cov_loess', 'var_p_resid'), - reverse_log_scale = FALSE, - logbase = 2, - expression_threshold = 0, - nr_expression_groups = 20, - zscore_threshold = 1.5, - HVFname = 'hvf', - difference_in_cov = 0.1, - var_threshold = 1.5, - var_number = NULL, - random_subset = NULL, - set_seed = TRUE, - seed_number = 1234, - show_plot = NULL, - return_plot = NULL, - save_plot = NULL, - save_param = list(), - default_save_name = 'HVFplot', - return_gobject = TRUE) { - - # NSE vars - selected = feats = var = NULL - - # determine whether to use parallel functions - # Do not use future if future packages are not installed - # Do not use future if plan is "sequential" - has_future <- requireNamespace("future.apply", quietly = TRUE) && - requireNamespace("future", quietly = TRUE) - use_parallel <- ifelse(has_future, - !("sequential" %in% class(future::plan())), - FALSE) - - # Set feat_type and spat_unit - spat_unit = set_default_spat_unit(gobject = gobject, - spat_unit = spat_unit) - feat_type = set_default_feat_type(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type) - - # expression values to be used - values = match.arg(expression_values, unique(c('normalized', 'scaled', 'custom', expression_values))) - expr_values = get_expression_values(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - values = values, - output = 'matrix') - - # not advised - if(isTRUE(reverse_log_scale)) { - expr_values = (logbase^expr_values)-1 - } - - # create a random subset if random_subset is not NULL - if (!is.null(random_subset)) { - if (isTRUE(set_seed)) set.seed(seed = seed_number) - - random_selection <- sort(sample(1:ncol(expr_values), random_subset)) - expr_values <- expr_values[, random_selection] - - if (isTRUE(set_seed)) GiottoUtils::random_seed() - } - - - - # print, return and save parameters - show_plot = ifelse(is.na(show_plot), readGiottoInstructions(gobject, param = 'show_plot'), show_plot) - save_plot = ifelse(is.na(save_plot), readGiottoInstructions(gobject, param = 'save_plot'), save_plot) - return_plot = ifelse(is.na(return_plot), readGiottoInstructions(gobject, param = 'return_plot'), return_plot) - - - # method to use - method = match.arg(method, choices = c('cov_groups', 'cov_loess', 'var_p_resid')) - # select function to use based on whether future parallelization is planned - calc_cov_fun <- ifelse( - use_parallel, - .calc_expr_cov_stats_parallel, - .calc_expr_cov_stats - ) - - results <- switch( - method, - "var_p_resid" = { - .calc_var_hvf( - scaled_matrix = expr_values, - var_threshold = var_threshold, - var_number = var_number, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot, - use_parallel = use_parallel - ) - }, - "cov_groups" = { - calc_cov_fun(expr_values, expression_threshold) %>% - .calc_cov_group_hvf(nr_expression_groups = nr_expression_groups, - zscore_threshold = zscore_threshold, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot) - }, - "cov_loess" = { - calc_cov_fun(expr_values, expression_threshold) %>% - .calc_cov_loess_hvf(difference_in_cov = difference_in_cov, - show_plot = show_plot, - return_plot = return_plot, - save_plot = save_plot) - } - ) - - ## unpack results - feat_in_cells_detected = results[['dt']] - pl = results[['pl']] + spat_unit = NULL, + feat_type = NULL, + expression_values = c("normalized", "scaled", "custom"), + method = c("cov_groups", "cov_loess", "var_p_resid"), + reverse_log_scale = FALSE, + logbase = 2, + expression_threshold = 0, + nr_expression_groups = 20, + zscore_threshold = 1.5, + HVFname = "hvf", + difference_in_cov = 0.1, + var_threshold = 1.5, + var_number = NULL, + random_subset = NULL, + set_seed = TRUE, + seed_number = 1234, + show_plot = NULL, + return_plot = NULL, + save_plot = NULL, + save_param = list(), + default_save_name = "HVFplot", + return_gobject = TRUE) { + # NSE vars + selected <- feats <- var <- NULL + + # determine whether to use parallel functions + # Do not use future if future packages are not installed + # Do not use future if plan is "sequential" + has_future <- requireNamespace("future.apply", quietly = TRUE) && + requireNamespace("future", quietly = TRUE) + use_parallel <- ifelse(has_future, + !("sequential" %in% class(future::plan())), + FALSE + ) + # Set feat_type and spat_unit + spat_unit <- set_default_spat_unit( + gobject = gobject, + spat_unit = spat_unit + ) + feat_type <- set_default_feat_type( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type + ) + # expression values to be used + values <- match.arg( + expression_values, + unique(c("normalized", "scaled", "custom", expression_values))) + expr_values <- getExpression( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + values = values, + output = "matrix" + ) + # not advised + if (isTRUE(reverse_log_scale)) { + expr_values <- (logbase^expr_values) - 1 + } - ## print plot - if(isTRUE(show_plot)) { - print(pl) - } + # create a random subset if random_subset is not NULL + if (!is.null(random_subset)) { + if (isTRUE(set_seed)) set.seed(seed = seed_number) - ## save plot - if(isTRUE(save_plot)) { - do.call(GiottoVisuals::all_plots_save_function, c(list(gobject = gobject, plot_object = pl, default_save_name = default_save_name), save_param)) - } + random_selection <- sort(sample(1:ncol(expr_values), random_subset)) + expr_values <- expr_values[, random_selection] - ## return plot - if(isTRUE(return_plot)) { - if(isTRUE(return_gobject)) { - cat('return_plot = TRUE and return_gobject = TRUE \n - plot will not be returned to object, but can still be saved with save_plot = TRUE or manually \n') - } else { - return(pl) + if (isTRUE(set_seed)) GiottoUtils::random_seed() } - } - if(isTRUE(return_gobject)) { - # add HVG metadata to feat_metadata - feat_metadata <- getFeatureMetadata(gobject, - spat_unit = spat_unit, - feat_type = feat_type, - output = 'featMetaObj', - copy_obj = TRUE) + # print, return and save parameters + show_plot <- ifelse(is.na(show_plot), + readGiottoInstructions(gobject, param = "show_plot"), + show_plot) + save_plot <- ifelse(is.na(save_plot), + readGiottoInstructions(gobject, param = "save_plot"), + save_plot) + return_plot <- ifelse(is.na(return_plot), + readGiottoInstructions(gobject, param = "return_plot"), + return_plot) - column_names_feat_metadata = colnames(feat_metadata[]) - if(HVFname %in% column_names_feat_metadata) { - cat('\n ', HVFname, ' has already been used, will be overwritten \n') - feat_metadata[][, eval(HVFname) := NULL] + # method to use + method <- match.arg( + method, choices = c("cov_groups", "cov_loess", "var_p_resid")) + # select function to use based on whether future parallelization is planned + calc_cov_fun <- ifelse( + use_parallel, + .calc_expr_cov_stats_parallel, + .calc_expr_cov_stats + ) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - gobject = setFeatureMetadata(gobject, - x = feat_metadata, - verbose = FALSE, - initialize = FALSE) - ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### - } + results <- switch(method, + "var_p_resid" = { + .calc_var_hvf( + scaled_matrix = expr_values, + var_threshold = var_threshold, + var_number = var_number, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot, + use_parallel = use_parallel + ) + }, + "cov_groups" = { + calc_cov_fun(expr_values, expression_threshold) %>% + .calc_cov_group_hvf( + nr_expression_groups = nr_expression_groups, + zscore_threshold = zscore_threshold, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot + ) + }, + "cov_loess" = { + calc_cov_fun(expr_values, expression_threshold) %>% + .calc_cov_loess_hvf( + difference_in_cov = difference_in_cov, + show_plot = show_plot, + return_plot = return_plot, + save_plot = save_plot + ) + } + ) - if(method == 'var_p_resid') { - HVGfeats = feat_in_cells_detected[,.(feats, var, selected)] - data.table::setnames(HVGfeats, 'selected', HVFname) - } else { - HVGfeats = feat_in_cells_detected[,.(feats, selected)] - data.table::setnames(HVGfeats, 'selected', HVFname) - } + ## unpack results + feat_in_cells_detected <- results[["dt"]] + pl <- results[["pl"]] - gobject = addFeatMetadata(gobject = gobject, - spat_unit = spat_unit, - feat_type = feat_type, - new_metadata = HVGfeats, - by_column = TRUE, - column_feat_ID = 'feats') - ## update parameters used ## - gobject = update_giotto_params(gobject, description = '_hvf') - return(gobject) + ## print plot + if (isTRUE(show_plot)) { + print(pl) + } + + ## save plot + if (isTRUE(save_plot)) { + do.call( + GiottoVisuals::all_plots_save_function, + c(list(gobject = gobject, plot_object = pl, + default_save_name = default_save_name), save_param)) + } + + ## return plot + if (isTRUE(return_plot)) { + if (isTRUE(return_gobject)) { + message("return_plot = TRUE and return_gobject = TRUE \n + plot will not be returned to object, but can still be + saved with save_plot = TRUE or manually") + } else { + return(pl) + } + } - } else { - return(feat_in_cells_detected) - } + if (isTRUE(return_gobject)) { + # add HVG metadata to feat_metadata + feat_metadata <- getFeatureMetadata(gobject, + spat_unit = spat_unit, + feat_type = feat_type, + output = "featMetaObj", + copy_obj = TRUE + ) + + column_names_feat_metadata <- colnames(feat_metadata[]) + + if (HVFname %in% column_names_feat_metadata) { + cat(HVFname, " has already been used, will be overwritten") + feat_metadata[][, eval(HVFname) := NULL] + + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + gobject <- setFeatureMetadata(gobject, + x = feat_metadata, + verbose = FALSE, + initialize = FALSE + ) + ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + } + + if (method == "var_p_resid") { + HVGfeats <- feat_in_cells_detected[, .(feats, var, selected)] + data.table::setnames(HVGfeats, "selected", HVFname) + } else { + HVGfeats <- feat_in_cells_detected[, .(feats, selected)] + data.table::setnames(HVGfeats, "selected", HVFname) + } + + + gobject <- addFeatMetadata( + gobject = gobject, + spat_unit = spat_unit, + feat_type = feat_type, + new_metadata = HVGfeats, + by_column = TRUE, + column_feat_ID = "feats" + ) + + ## update parameters used ## + gobject <- update_giotto_params(gobject, description = "_hvf") + + return(gobject) + } else { + return(feat_in_cells_detected) + } } @@ -451,49 +486,83 @@ calculateHVF <- function(gobject, # plot generation #### -.create_cov_group_hvf_plot = function(feat_in_cells_detected, nr_expression_groups) { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() + - ggplot2::theme(axis.title = ggplot2::element_text(size = 14), - axis.text = ggplot2::element_text(size = 12)) - pl <- pl + ggplot2::geom_point(data = feat_in_cells_detected, ggplot2::aes_string(x = 'mean_expr', y = 'cov', color = 'selected')) - pl <- pl + ggplot2::scale_color_manual(values = c(no = 'lightgrey', yes = 'orange'), - guide = ggplot2::guide_legend(title = 'HVF', - override.aes = list(size=5))) - pl <- pl + ggplot2::facet_wrap(~expr_groups, ncol = nr_expression_groups, scales = 'free_x') - pl <- pl + ggplot2::theme(axis.text.x = ggplot2::element_blank(), - strip.text = ggplot2::element_text(size = 4)) - pl <- pl + ggplot2::labs(x = 'expression groups', y = 'cov') - pl +.create_cov_group_hvf_plot <- function( + feat_in_cells_detected, nr_expression_groups) { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + + ggplot2::theme( + axis.title = ggplot2::element_text(size = 14), + axis.text = ggplot2::element_text(size = 12) + ) + pl <- pl + ggplot2::geom_point( + data = feat_in_cells_detected, + ggplot2::aes_string(x = "mean_expr", y = "cov", color = "selected")) + pl <- pl + ggplot2::scale_color_manual( + values = c(no = "lightgrey", yes = "orange"), + guide = ggplot2::guide_legend( + title = "HVF", + override.aes = list(size = 5) + ) + ) + pl <- pl + ggplot2::facet_wrap( + ~expr_groups, ncol = nr_expression_groups, scales = "free_x") + pl <- pl + ggplot2::theme( + axis.text.x = ggplot2::element_blank(), + strip.text = ggplot2::element_text(size = 4) + ) + pl <- pl + ggplot2::labs(x = "expression groups", y = "cov") + pl } -.create_cov_loess_hvf_plot = function(feat_in_cells_detected, difference_in_cov, var_col) { - pl <- ggplot2::ggplot() - pl <- pl + ggplot2::theme_classic() + - ggplot2::theme(axis.title = ggplot2::element_text(size = 14), - axis.text = ggplot2::element_text(size = 12)) - pl <- pl + ggplot2::geom_point(data = feat_in_cells_detected, ggplot2::aes_string(x = 'log(mean_expr)', y = var_col, color = 'selected')) - pl <- pl + ggplot2::geom_line(data = feat_in_cells_detected, ggplot2::aes_string(x = 'log(mean_expr)', y = 'pred_cov_feats'), color = 'blue') - hvg_line = paste0('pred_cov_feats+',difference_in_cov) - pl <- pl + ggplot2::geom_line(data = feat_in_cells_detected, ggplot2::aes_string(x = 'log(mean_expr)', y = hvg_line), linetype = 2) - pl <- pl + ggplot2::labs(x = 'log(mean expression)', y = var_col) - pl <- pl + ggplot2::scale_color_manual(values = c(no = 'lightgrey', yes = 'orange'), - guide = ggplot2::guide_legend(title = 'HVF', - override.aes = list(size=5))) - pl +.create_cov_loess_hvf_plot <- function( + feat_in_cells_detected, difference_in_cov, var_col) { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::theme_classic() + + ggplot2::theme( + axis.title = ggplot2::element_text(size = 14), + axis.text = ggplot2::element_text(size = 12) + ) + pl <- pl + ggplot2::geom_point( + data = feat_in_cells_detected, + ggplot2::aes_string(x = "log(mean_expr)", y = var_col, + color = "selected")) + pl <- pl + ggplot2::geom_line( + data = feat_in_cells_detected, + ggplot2::aes_string(x = "log(mean_expr)", y = "pred_cov_feats"), + color = "blue") + hvg_line <- paste0("pred_cov_feats+", difference_in_cov) + pl <- pl + ggplot2::geom_line( + data = feat_in_cells_detected, + ggplot2::aes_string(x = "log(mean_expr)", y = hvg_line), linetype = 2) + pl <- pl + ggplot2::labs(x = "log(mean expression)", y = var_col) + pl <- pl + ggplot2::scale_color_manual( + values = c(no = "lightgrey", yes = "orange"), + guide = ggplot2::guide_legend( + title = "HVF", + override.aes = list(size = 5) + ) + ) + pl } -.create_calc_var_hvf_plot = function(dt_res) { - pl = ggplot2::ggplot() - pl = pl + ggplot2::geom_point(data = dt_res, aes_string(x = 'rank', y = 'var', color = 'selected')) - pl = pl + ggplot2::scale_x_reverse() - pl = pl + ggplot2::theme_classic() + ggplot2::theme(axis.title = ggplot2::element_text(size = 14), - axis.text = ggplot2::element_text(size = 12)) - pl = pl + ggplot2::scale_color_manual(values = c(no = 'lightgrey', yes = 'orange'), - guide = ggplot2::guide_legend(title = 'HVF', - override.aes = list(size=5))) - pl = pl + ggplot2::labs(x = 'feature rank', y = 'variance') - pl +.create_calc_var_hvf_plot <- function(dt_res) { + pl <- ggplot2::ggplot() + pl <- pl + ggplot2::geom_point( + data = dt_res, aes_string(x = "rank", y = "var", color = "selected")) + pl <- pl + ggplot2::scale_x_reverse() + pl <- pl + ggplot2::theme_classic() + ggplot2::theme( + axis.title = ggplot2::element_text(size = 14), + axis.text = ggplot2::element_text(size = 12) + ) + pl <- pl + ggplot2::scale_color_manual( + values = c(no = "lightgrey", yes = "orange"), + guide = ggplot2::guide_legend( + title = "HVF", + override.aes = list(size = 5) + ) + ) + pl <- pl + ggplot2::labs(x = "feature rank", y = "variance") + pl } diff --git a/R/wnn.R b/R/wnn.R index 4f704142c..27e623de7 100644 --- a/R/wnn.R +++ b/R/wnn.R @@ -13,423 +13,484 @@ #' @param w_name_modality_2 name for modality 2 weights #' @param verbose be verbose #' -#' @return A Giotto object with integrated UMAP (integrated.umap) within the dimension_reduction slot and Leiden clusters (integrated_leiden_clus) in the cellular metadata. +#' @return A Giotto object with integrated UMAP (integrated.umap) within the +#' dimension_reduction slot and Leiden clusters (integrated_leiden_clus) in the +#' cellular metadata. #' @export runWNN <- function(gobject, - spat_unit = 'cell', - modality_1 = 'rna', - modality_2 = 'protein', - pca_name_modality_1 = 'rna.pca', - pca_name_modality_2 = 'protein.pca', - k = 20, - integrated_feat_type = NULL, - matrix_result_name = NULL, - w_name_modality_1 = NULL, - w_name_modality_2 = NULL, - verbose = FALSE) { - - # validate Giotto object - if (!inherits(gobject, "giotto")) { - stop("gobject needs to be a giotto object") - } - - # validate modalities - if(!modality_1 %in% names(gobject@dimension_reduction$cells[[spat_unit]]) || !modality_2 %in% names(gobject@dimension_reduction$cells[[spat_unit]]) ) { - stop(paste(modality_1, "and", modality_2, " pca must exist")) - } - - # extract PCA - - ## modality 1 - kNN_1 <- get_NearestNetwork(gobject, - spat_unit = spat_unit, - feat_type = modality_1, - nn_network_to_use = "kNN") - kNN_1 <- slot(kNN_1, "igraph") - - pca_1 <- get_dimReduction(gobject, - spat_unit = spat_unit, - feat_type = modality_1, - reduction = "cells", - reduction_method = "pca", - name = pca_name_modality_1) - pca_1 <- slot(pca_1, "coordinates") - - ## modality 2 - kNN_2 <- get_NearestNetwork(gobject, - spat_unit = spat_unit, - feat_type = modality_2, - nn_network_to_use = "kNN") - kNN_2 <- slot(kNN_2, "igraph") - - pca_2 <- get_dimReduction(gobject, - spat_unit = "cell", - feat_type = modality_2, - reduction = "cells", - reduction_method = "pca", - name = pca_name_modality_2) - pca_2 <- slot(pca_2, "coordinates") - - ## get cell names - cell_names <- unique(igraph::get.edgelist(kNN_1)[,1]) - - ######################## distances calculation ############################ - - print(paste("Calculating",modality_1,"-",modality_1,"distance")) - - ### distances modality1 modality1 - cell_distances_1_1 <- list() - - for (cell_a in cell_names) { - - my_kNN <- kNN_1[[cell_a]][[cell_a]] - - cell_distances_1_1[[cell_a]] <- rep(0, k) - names(cell_distances_1_1[[cell_a]]) <- names(my_kNN) - - for (cell_i in names(my_kNN)) { - dimensions_cell_a_i <- pca_1[c(cell_a, cell_i),] - cell_distances_1_1[[cell_a]][cell_i] <- sqrt(sum((dimensions_cell_a_i[1,] - dimensions_cell_a_i[2,])^2)) + spat_unit = "cell", + modality_1 = "rna", + modality_2 = "protein", + pca_name_modality_1 = "rna.pca", + pca_name_modality_2 = "protein.pca", + k = 20, + integrated_feat_type = NULL, + matrix_result_name = NULL, + w_name_modality_1 = NULL, + w_name_modality_2 = NULL, + verbose = FALSE) { + # validate Giotto object + if (!inherits(gobject, "giotto")) { + stop("gobject needs to be a giotto object") } - } - ### distances modality2 modality2 - - print(paste("Calculating",modality_2,"-",modality_2,"distance")) - - cell_distances_2_2 <- list() - - for (cell_a in cell_names) { - my_kNN <- kNN_2[[cell_a]][[cell_a]] - - cell_distances_2_2[[cell_a]] <- rep(0, k) - names(cell_distances_2_2[[cell_a]]) <- names(my_kNN) - - for (cell_i in names(my_kNN)) { - dimensions_cell_a_i <- pca_2[c(cell_a, cell_i),] - cell_distances_2_2[[cell_a]][cell_i] <- sqrt(sum((dimensions_cell_a_i[1,] - dimensions_cell_a_i[2,])^2)) + # validate modalities + if (!modality_1 %in% names( + gobject@dimension_reduction$cells[[spat_unit]]) || + !modality_2 %in% names(gobject@dimension_reduction$cells[[spat_unit]])) + { + stop(paste(modality_1, "and", modality_2, " pca must exist")) } - } - - ########################### all cell-cell distances ############################ - - ## modality1 modality1 - - if(verbose) print(paste("Calculating low dimensional cell-cell distances for", modality_1)) - - all_cell_distances_1_1 = dist(pca_1) - all_cell_distances_1_1 = as.matrix(all_cell_distances_1_1) - - ## modality2 modality2 - - if(verbose) print(paste("Calculating low dimensional cell-cell distances for", modality_2)) + # extract PCA + + ## modality 1 + kNN_1 <- getNearestNetwork(gobject, + spat_unit = spat_unit, + feat_type = modality_1, + nn_type = "kNN" + ) + kNN_1 <- slot(kNN_1, "igraph") + + pca_1 <- getDimReduction(gobject, + spat_unit = spat_unit, + feat_type = modality_1, + reduction = "cells", + reduction_method = "pca", + name = pca_name_modality_1 + ) + pca_1 <- slot(pca_1, "coordinates") + + ## modality 2 + kNN_2 <- getNearestNetwork(gobject, + spat_unit = spat_unit, + feat_type = modality_2, + nn_type = "kNN" + ) + kNN_2 <- slot(kNN_2, "igraph") + + pca_2 <- getDimReduction(gobject, + spat_unit = "cell", + feat_type = modality_2, + reduction = "cells", + reduction_method = "pca", + name = pca_name_modality_2 + ) + pca_2 <- slot(pca_2, "coordinates") + + ## get cell names + cell_names <- unique(igraph::get.edgelist(kNN_1)[, 1]) + + ######################## distances calculation ############################ + + message(paste("Calculating", modality_1, "-", modality_1, "distance")) + + ### distances modality1 modality1 + cell_distances_1_1 <- list() + + for (cell_a in cell_names) { + my_kNN <- kNN_1[[cell_a]][[cell_a]] + + cell_distances_1_1[[cell_a]] <- rep(0, k) + names(cell_distances_1_1[[cell_a]]) <- names(my_kNN) + + for (cell_i in names(my_kNN)) { + dimensions_cell_a_i <- pca_1[c(cell_a, cell_i), ] + cell_distances_1_1[[cell_a]][cell_i] <- sqrt(sum(( + dimensions_cell_a_i[1, ] - dimensions_cell_a_i[2, ])^2)) + } + } - all_cell_distances_2_2 = dist(pca_2) - all_cell_distances_2_2 = as.matrix(all_cell_distances_2_2) + ### distances modality2 modality2 + message(paste("Calculating", modality_2, "-", modality_2, "distance")) - ######################## within-modality prediction ############################ + cell_distances_2_2 <- list() - if(verbose) print("Calculating within-modality prediction") + for (cell_a in cell_names) { + my_kNN <- kNN_2[[cell_a]][[cell_a]] - ### predicted modality1 modality1 - predicted_1_1 <- list() + cell_distances_2_2[[cell_a]] <- rep(0, k) + names(cell_distances_2_2[[cell_a]]) <- names(my_kNN) - for (cell_a in cell_names) { - dimensions_cell_a <- pca_1[kNN_1[[cell_a]][[cell_a]],] + for (cell_i in names(my_kNN)) { + dimensions_cell_a_i <- pca_2[c(cell_a, cell_i), ] + cell_distances_2_2[[cell_a]][cell_i] <- sqrt(sum(( + dimensions_cell_a_i[1, ] - dimensions_cell_a_i[2, ])^2)) + } + } - predicted_1_1[[cell_a]] <- colSums(dimensions_cell_a)/k - } + ########################### all cell-cell distances ######################## - ### predicted modality2 modality2 - predicted_2_2 <- list() + ## modality1 modality1 - for (cell_a in cell_names) { - dimensions_cell_a <- pca_2[kNN_2[[cell_a]][[cell_a]],] + if (verbose) + message(paste("Calculating low dimensional cell-cell distances for", + modality_1)) - predicted_2_2[[cell_a]] <- colSums(dimensions_cell_a)/k - } + all_cell_distances_1_1 <- dist(pca_1) + all_cell_distances_1_1 <- as.matrix(all_cell_distances_1_1) - ######################## cross-modality prediction ############################ + ## modality2 modality2 - if(verbose) print("Calculating cross-modality prediction") + if (verbose) + message(paste("Calculating low dimensional cell-cell distances for", + modality_2)) - ## predicted modality1 modality2 - predicted_1_2 <- list() - for (cell_a in cell_names) { - dimensions_cell_a <- pca_1[kNN_2[[cell_a]][[cell_a]],] + all_cell_distances_2_2 <- dist(pca_2) + all_cell_distances_2_2 <- as.matrix(all_cell_distances_2_2) - predicted_1_2[[cell_a]] <- colSums(dimensions_cell_a)/k - } - ## predicted modality2 modality1 - predicted_2_1 <- list() + ######################## within-modality prediction ######################## - for (cell_a in cell_names) { - dimensions_cell_a <- pca_2[kNN_1[[cell_a]][[cell_a]],] + if (verbose) message("Calculating within-modality prediction") - predicted_2_1[[cell_a]] <- colSums(dimensions_cell_a)/k - } + ### predicted modality1 modality1 + predicted_1_1 <- list() - ###################### calculate jaccard similarities ########################## + for (cell_a in cell_names) { + dimensions_cell_a <- pca_1[kNN_1[[cell_a]][[cell_a]], ] - if(verbose) print("Calculating Jaccard similarities") + predicted_1_1[[cell_a]] <- colSums(dimensions_cell_a) / k + } - ## modality1 modality1 - sNN_1 <- createNearestNetwork(gobject, - spat_unit = "cell", - feat_type = modality_1, - type = "sNN", - dim_reduction_to_use = "pca", - dim_reduction_name = pca_name_modality_1, - dimensions_to_use = 1:100, - return_gobject = FALSE, - minimum_shared = 1, - k = 20) + ### predicted modality2 modality2 + predicted_2_2 <- list() - sNN_1 <- igraph::as_data_frame(sNN_1) + for (cell_a in cell_names) { + dimensions_cell_a <- pca_2[kNN_2[[cell_a]][[cell_a]], ] - ## modality2 modality2 + predicted_2_2[[cell_a]] <- colSums(dimensions_cell_a) / k + } - sNN_2 <- createNearestNetwork(gobject, - spat_unit = "cell", - feat_type = modality_2, - type = "sNN", - dim_reduction_to_use = "pca", - dim_reduction_name = pca_name_modality_2, - dimensions_to_use = 1:100, - return_gobject = FALSE, - minimum_shared = 1, - k = 20) + ######################## cross-modality prediction ######################### - sNN_2 <- igraph::as_data_frame(sNN_2) + if (verbose) message("Calculating cross-modality prediction") - if(verbose) print("Calculating kernel bandwidths") + ## predicted modality1 modality2 + predicted_1_2 <- list() - # cell-specific kernel bandwidth. + for (cell_a in cell_names) { + dimensions_cell_a <- pca_1[kNN_2[[cell_a]][[cell_a]], ] - ## modality1 - modality1_sigma_i <- numeric() + predicted_1_2[[cell_a]] <- colSums(dimensions_cell_a) / k + } - for(cell_a in cell_names) { - ### 20 small jaccard values - jaccard_values <- sNN_1[sNN_1$from == cell_a,] + ## predicted modality2 modality1 + predicted_2_1 <- list() - if (nrow(jaccard_values == 20)) { + for (cell_a in cell_names) { + dimensions_cell_a <- pca_2[kNN_1[[cell_a]][[cell_a]], ] - further_cell_cell_distances <- all_cell_distances_1_1[cell_a,jaccard_values$to] + predicted_2_1[[cell_a]] <- colSums(dimensions_cell_a) / k + } - } else { - further_cell_cell_distances <- tail(sort(all_cell_distances_1_1[cell_a,]), 20) + ###################### calculate jaccard similarities ###################### + + if (verbose) message("Calculating Jaccard similarities") + + ## modality1 modality1 + sNN_1 <- createNearestNetwork(gobject, + spat_unit = "cell", + feat_type = modality_1, + type = "sNN", + dim_reduction_to_use = "pca", + dim_reduction_name = pca_name_modality_1, + dimensions_to_use = 1:100, + return_gobject = FALSE, + minimum_shared = 1, + k = 20 + ) + + sNN_1 <- igraph::as_data_frame(sNN_1) + + ## modality2 modality2 + + sNN_2 <- createNearestNetwork(gobject, + spat_unit = "cell", + feat_type = modality_2, + type = "sNN", + dim_reduction_to_use = "pca", + dim_reduction_name = pca_name_modality_2, + dimensions_to_use = 1:100, + return_gobject = FALSE, + minimum_shared = 1, + k = 20 + ) + + sNN_2 <- igraph::as_data_frame(sNN_2) + + if (verbose) message("Calculating kernel bandwidths") + + # cell-specific kernel bandwidth. + + ## modality1 + modality1_sigma_i <- numeric() + + for (cell_a in cell_names) { + ### 20 small jaccard values + jaccard_values <- sNN_1[sNN_1$from == cell_a, ] + + if (nrow(jaccard_values == 20)) { + further_cell_cell_distances <- all_cell_distances_1_1[ + cell_a, jaccard_values$to] + } else { + further_cell_cell_distances <- tail(sort(all_cell_distances_1_1[ + cell_a, ]), 20) + } + + modality1_sigma_i[cell_a] <- mean(further_cell_cell_distances) + # cell-specific kernel bandwidth. } - modality1_sigma_i[cell_a] <- mean(further_cell_cell_distances) # cell-specific kernel bandwidth. - } + ## modality2 - ## modality2 + modality2_sigma_i <- numeric() - modality2_sigma_i <- numeric() + for (cell_a in cell_names) { + ### 20 small jaccard values + jaccard_values <- sNN_2[sNN_2$from == cell_a, ] - for(cell_a in cell_names) { - ### 20 small jaccard values - jaccard_values <- sNN_2[sNN_2$from == cell_a,] + if (nrow(jaccard_values == 20)) { + further_cell_cell_distances <- all_cell_distances_2_2[ + cell_a, jaccard_values$to] + } else { + further_cell_cell_distances <- tail(sort(all_cell_distances_2_2[ + cell_a, ]), 20) + } - if (nrow(jaccard_values == 20)) { - further_cell_cell_distances <- all_cell_distances_2_2[cell_a,jaccard_values$to] - } else { - further_cell_cell_distances <- tail(sort(all_cell_distances_2_2[cell_a,]), 20) + modality2_sigma_i[cell_a] <- mean(further_cell_cell_distances) + # cell-specific kernel bandwidth. } - modality2_sigma_i[cell_a] <- mean(further_cell_cell_distances) # cell-specific kernel bandwidth. - } - - - ###################### cell-specific modality weights ########################## - if(verbose) print("Calculating modality weights") + ###################### cell-specific modality weights ###################### - ## modality1 modality1 - theta_1_1 <- list() + if (verbose) message("Calculating modality weights") - for (cell_a in cell_names) { - modality1_i <- pca_1[cell_a,] # profile of current cell - d_modality1_i_modality2_predicted <- sqrt(sum((modality1_i - predicted_1_1[[cell_a]])^2)) + ## modality1 modality1 + theta_1_1 <- list() - first_knn <- names(sort(cell_distances_1_1[[cell_a]]))[1] - modality1_knn1 <- pca_1[first_knn,] # profile of the nearest neighbor - d_modality1_i_modality1_knn1 <- sqrt(sum((modality1_i - modality1_knn1)^2)) + for (cell_a in cell_names) { + modality1_i <- pca_1[cell_a, ] # profile of current cell + d_modality1_i_modality2_predicted <- sqrt(sum(( + modality1_i - predicted_1_1[[cell_a]])^2)) - difference_distances <- d_modality1_i_modality2_predicted - d_modality1_i_modality1_knn1 - max_value <- max(c(difference_distances, 0)) + first_knn <- names(sort(cell_distances_1_1[[cell_a]]))[1] + modality1_knn1 <- pca_1[first_knn, ] # profile of the nearest neighbor + d_modality1_i_modality1_knn1 <- sqrt(sum(( + modality1_i - modality1_knn1)^2)) - theta_1_1[[cell_a]] <- exp( (-max_value)/(modality1_sigma_i[cell_a] - d_modality1_i_modality1_knn1) ) - } + difference_distances <- d_modality1_i_modality2_predicted - + d_modality1_i_modality1_knn1 + max_value <- max(c(difference_distances, 0)) - ## modality2 modality2 - theta_modality2_modality2 <- list() - - for (cell_a in cell_names) { - modality2_i <- pca_2[cell_a,] # profile of current cell - d_modality2_i_modality2_predicted <- sqrt(sum((modality2_i - predicted_2_2[[cell_a]])^2)) + theta_1_1[[cell_a]] <- exp(( + -max_value) / (modality1_sigma_i[cell_a] - + d_modality1_i_modality1_knn1)) + } - first_knn <- names(sort(cell_distances_2_2[[cell_a]]))[1] - modality2_knn1 <- pca_2[first_knn,] # profile of the nearest neighbor - d_modality2_i_modality2_knn1 <- sqrt(sum((modality2_i - modality2_knn1)^2)) + ## modality2 modality2 + theta_modality2_modality2 <- list() - difference_distances <- d_modality2_i_modality2_predicted - d_modality2_i_modality2_knn1 - max_value <- max(c(difference_distances, 0)) + for (cell_a in cell_names) { + modality2_i <- pca_2[cell_a, ] # profile of current cell + d_modality2_i_modality2_predicted <- sqrt(sum(( + modality2_i - predicted_2_2[[cell_a]])^2)) - theta_modality2_modality2[[cell_a]] <- exp( (-max_value)/(modality2_sigma_i[cell_a] - d_modality2_i_modality2_knn1) ) - } + first_knn <- names(sort(cell_distances_2_2[[cell_a]]))[1] + modality2_knn1 <- pca_2[first_knn, ] # profile of the nearest neighbor + d_modality2_i_modality2_knn1 <- sqrt(sum(( + modality2_i - modality2_knn1)^2)) + difference_distances <- d_modality2_i_modality2_predicted - + d_modality2_i_modality2_knn1 + max_value <- max(c(difference_distances, 0)) - ## modality1 modality2 - theta_modality1_modality2 <- list() + theta_modality2_modality2[[cell_a]] <- exp(( + -max_value) / (modality2_sigma_i[cell_a] - + d_modality2_i_modality2_knn1)) + } - for (cell_a in cell_names) { - modality1_i <- pca_1[cell_a,] # profile of current cell - d_modality1_i_modality2_predicted <- sqrt(sum((modality1_i - predicted_1_2[[cell_a]])^2)) - first_knn <- names(sort(cell_distances_1_1[[cell_a]]))[1] - modality1_knn1 <- pca_1[first_knn,] # profile of the nearest neighbor - d_modality1_i_modality1_knn1 <- sqrt(sum((modality1_i - modality1_knn1)^2)) + ## modality1 modality2 + theta_modality1_modality2 <- list() - difference_distances <- d_modality1_i_modality2_predicted - d_modality1_i_modality1_knn1 - max_value <- max(c(difference_distances, 0)) + for (cell_a in cell_names) { + modality1_i <- pca_1[cell_a, ] # profile of current cell + d_modality1_i_modality2_predicted <- sqrt(sum(( + modality1_i - predicted_1_2[[cell_a]])^2)) - theta_modality1_modality2[[cell_a]] <- exp( (-max_value)/(modality1_sigma_i[cell_a] - d_modality1_i_modality1_knn1) ) - } + first_knn <- names(sort(cell_distances_1_1[[cell_a]]))[1] + modality1_knn1 <- pca_1[first_knn, ] # profile of the nearest neighbor + d_modality1_i_modality1_knn1 <- sqrt(sum(( + modality1_i - modality1_knn1)^2)) + difference_distances <- d_modality1_i_modality2_predicted - + d_modality1_i_modality1_knn1 + max_value <- max(c(difference_distances, 0)) - ## modality2 modality1 - theta_modality2_modality1 <- list() + theta_modality1_modality2[[cell_a]] <- exp(( + -max_value) / (modality1_sigma_i[cell_a] - + d_modality1_i_modality1_knn1)) + } - for (cell_a in cell_names) { - modality2_i <- pca_2[cell_a,] # profile of current cell - d_modality2_i_modality1_predicted <- sqrt(sum((modality2_i - predicted_2_1[[cell_a]])^2)) - first_knn <- names(sort(cell_distances_2_2[[cell_a]]))[1] - modality2_knn1 <- pca_2[first_knn,] # profile of the nearest neighbor - d_modality2_i_modality2_knn1 <- sqrt(sum((modality2_i - modality2_knn1)^2)) + ## modality2 modality1 + theta_modality2_modality1 <- list() - difference_distances <- d_modality2_i_modality1_predicted - d_modality2_i_modality2_knn1 - max_value <- max(c(difference_distances, 0)) + for (cell_a in cell_names) { + modality2_i <- pca_2[cell_a, ] # profile of current cell + d_modality2_i_modality1_predicted <- sqrt(sum(( + modality2_i - predicted_2_1[[cell_a]])^2)) - theta_modality2_modality1[[cell_a]] <- exp( (-max_value)/(modality2_sigma_i[cell_a] - d_modality2_i_modality2_knn1) ) - } + first_knn <- names(sort(cell_distances_2_2[[cell_a]]))[1] + modality2_knn1 <- pca_2[first_knn, ] # profile of the nearest neighbor + d_modality2_i_modality2_knn1 <- sqrt(sum(( + modality2_i - modality2_knn1)^2)) + difference_distances <- d_modality2_i_modality1_predicted - + d_modality2_i_modality2_knn1 + max_value <- max(c(difference_distances, 0)) - ##################### ratio of affinities ###################################### + theta_modality2_modality1[[cell_a]] <- exp(( + -max_value) / (modality2_sigma_i[cell_a] - + d_modality2_i_modality2_knn1)) + } - if(verbose) print("Calculating WNN") - epsilon = 10^-4 + ##################### ratio of affinities ################################## - ## modality1 - ratio_modality1 <- list() + if (verbose) message("Calculating WNN") - for (cell_a in cell_names) { - ratio_modality1[[cell_a]] <- theta_1_1[[cell_a]]/(theta_modality1_modality2[[cell_a]] + epsilon) - } + epsilon <- 10^-4 + ## modality1 + ratio_modality1 <- list() - ## modality2 - ratio_modality2 <- list() + for (cell_a in cell_names) { + ratio_modality1[[cell_a]] <- theta_1_1[[cell_a]] / + (theta_modality1_modality2[[cell_a]] + epsilon) + } - for (cell_a in cell_names) { - ratio_modality2[[cell_a]] <- theta_modality2_modality2[[cell_a]]/(theta_modality2_modality1[[cell_a]] + epsilon) - } + ## modality2 + ratio_modality2 <- list() - ########################### normalization ###################################### + for (cell_a in cell_names) { + ratio_modality2[[cell_a]] <- theta_modality2_modality2[[cell_a]] / + (theta_modality2_modality1[[cell_a]] + epsilon) + } - if(verbose) print("Calculating WNN normalization") - w_modality1 <- rep(0, length(cell_names)) - names(w_modality1) = cell_names + ########################### normalization ################################## - for (cell_a in cell_names) { - w_modality1[cell_a] <- exp(ratio_modality1[[cell_a]])/(exp(ratio_modality1[[cell_a]]) + exp(ratio_modality2[[cell_a]])) - } + if (verbose) message("Calculating WNN normalization") - w_modality2 <- rep(0, length(cell_names)) - names(w_modality2) = cell_names + w_modality1 <- rep(0, length(cell_names)) + names(w_modality1) <- cell_names - for (cell_a in cell_names) { - w_modality2[cell_a] <- exp(ratio_modality2[[cell_a]])/(exp(ratio_modality1[[cell_a]]) + exp(ratio_modality2[[cell_a]])) - } + for (cell_a in cell_names) { + w_modality1[cell_a] <- exp(ratio_modality1[[cell_a]]) / + (exp(ratio_modality1[[cell_a]]) + exp(ratio_modality2[[cell_a]])) + } + w_modality2 <- rep(0, length(cell_names)) + names(w_modality2) <- cell_names - ######################### Calculating a WNN graph ############################## + for (cell_a in cell_names) { + w_modality2[cell_a] <- exp(ratio_modality2[[cell_a]]) / + (exp(ratio_modality1[[cell_a]]) + exp(ratio_modality2[[cell_a]])) + } - if(verbose) print("Calculating WNN graph") - theta_weighted <- matrix(rep(0,length(cell_names)*length(cell_names)), - ncol = length(cell_names), - nrow = length(cell_names)) + ######################### Calculating a WNN graph ########################## - colnames(theta_weighted) <- cell_names - rownames(theta_weighted) <- cell_names + if (verbose) message("Calculating WNN graph") - kernelpower <- 1 + theta_weighted <- matrix(rep(0, length(cell_names) * length(cell_names)), + ncol = length(cell_names), + nrow = length(cell_names) + ) - ## theta_modality1 + colnames(theta_weighted) <- cell_names + rownames(theta_weighted) <- cell_names - theta_modality1_cella_cellb <- exp(-1*(all_cell_distances_1_1/ modality1_sigma_i) ** kernelpower) + kernelpower <- 1 - ## theta_modality2 - theta_modality2_cella_cellb <- exp(-1*(all_cell_distances_2_2/ modality2_sigma_i) ** kernelpower) + ## theta_modality1 - ## theta_weighted - theta_weighted <- w_modality1*theta_modality1_cella_cellb + w_modality2*theta_modality2_cella_cellb + theta_modality1_cella_cellb <- exp(-1 * (all_cell_distances_1_1 / + modality1_sigma_i)**kernelpower) + ## theta_modality2 + theta_modality2_cella_cellb <- exp(-1 * (all_cell_distances_2_2 / + modality2_sigma_i)**kernelpower) - # save theta_weighted + ## theta_weighted + theta_weighted <- w_modality1 * theta_modality1_cella_cellb + + w_modality2 * theta_modality2_cella_cellb - if(verbose) print("Saving WNN results") + # save theta_weighted - ## set integrated feat_type and result name if not provided - if(is.null(integrated_feat_type)) {integrated_feat_type = paste0(modality_1,'_',modality_2)} + if (verbose) message("Saving WNN results") - if(is.null(matrix_result_name)) {matrix_result_name = 'theta_weighted_matrix'} - gobject <- set_multiomics(gobject = gobject, - result = theta_weighted, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = 'WNN', - result_name = matrix_result_name, - verbose = TRUE) + ## set integrated feat_type and result name if not provided + if (is.null(integrated_feat_type)) { + integrated_feat_type <- paste0(modality_1, "_", modality_2) + } + if (is.null(matrix_result_name)) { + matrix_result_name <- "theta_weighted_matrix" + } - # save modalities weight + gobject <- set_multiomics( + gobject = gobject, + result = theta_weighted, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = "WNN", + result_name = matrix_result_name, + verbose = TRUE + ) - ## modality 1 - if(is.null(w_name_modality_1)) {w_name_modality_1 = paste0('w_',modality_1)} - gobject <- set_multiomics(gobject = gobject, - result = w_modality1, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = 'WNN', - result_name = w_name_modality_1, - verbose = TRUE) + # save modalities weight - ## modality 2 - if(is.null(w_name_modality_2)) {w_name_modality_2 = paste0('w_',modality_2)} + ## modality 1 + if (is.null(w_name_modality_1)) { + w_name_modality_1 <- paste0("w_", modality_1) + } - gobject <- set_multiomics(gobject = gobject, - result = w_modality2, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = 'WNN', - result_name = w_name_modality_2, - verbose = TRUE) + gobject <- set_multiomics( + gobject = gobject, + result = w_modality1, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = "WNN", + result_name = w_name_modality_1, + verbose = TRUE + ) + + ## modality 2 + if (is.null(w_name_modality_2)) { + w_name_modality_2 <- paste0("w_", modality_2) + } - return(gobject) + gobject <- set_multiomics( + gobject = gobject, + result = w_modality2, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = "WNN", + result_name = w_name_modality_2, + verbose = TRUE + ) + + return(gobject) } @@ -451,134 +512,161 @@ runWNN <- function(gobject, #' @return A Giotto object with integrated UMAP #' @export runIntegratedUMAP <- function(gobject, - spat_unit = "cell", - modality1 = "rna", - modality2 = "protein", - integrated_feat_type = NULL, - integration_method = 'WNN', - matrix_result_name = 'theta_weighted_matrix', - k = 20, - spread = 5, - min_dist = 0.01, - force = FALSE, - ...) { - - if(is.null(integrated_feat_type)) { - integrated_feat_type = paste0(modality1,'_',modality2)} - - theta_weighted <- get_multiomics(gobject, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = integration_method, - result_name = matrix_result_name) - - #theta_weighted <- gobject@dimension_reduction$cells$cell$WNN$theta_weighted - theta_weighted[is.na(theta_weighted)] <- 0 - - if(is.null(gobject@nn_network[[spat_unit]][[modality1]]$kNN$integrated_kNN) || force == TRUE) { - - ################# Calculate integrated Nearest Neighbors ####################### - - print('Calculating integrated Nearest Neighbors') - - cell_names <- colnames(theta_weighted) - - nn_network = dbscan::kNN(x = theta_weighted, k = k, sort = TRUE) - from = to = weight = distance = from_cell_ID = to_cell_ID = shared = NULL - nn_network_dt = data.table::data.table(from = rep(1:nrow(nn_network$id), - k), - to = as.vector(nn_network$id), - weight = 1/(1 + as.vector(nn_network$dist)), - distance = as.vector(nn_network$dist)) - nn_network_dt[, `:=`(from_cell_ID, cell_names[from])] - nn_network_dt[, `:=`(to_cell_ID, cell_names[to])] - all_index = unique(x = c(nn_network_dt$from_cell_ID, nn_network_dt$to_cell_ID)) - - ################################ Create igraph ################################# - - print('Creating igraph') - - nn_network_igraph = igraph::graph_from_data_frame(nn_network_dt[,.(from_cell_ID, to_cell_ID, weight, distance)], - directed = TRUE, - vertices = all_index) - - ## store igraph - nnNetObj = create_nn_net_obj(name = 'integrated_kNN', - nn_type = 'kNN', - igraph = nn_network_igraph, - spat_unit = spat_unit, - feat_type = modality1) - - gobject = set_NearestNetwork(gobject = gobject, - nn_network = nnNetObj, - spat_unit = spat_unit, - feat_type = modality1, - nn_network_to_use = 'kNN', - network_name = 'integrated_kNN') - - ## store nn_network id - gobject <- set_multiomics(gobject = gobject, - result = nn_network$id, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = 'WNN', - result_name = 'integrated_kNN_id', - verbose = TRUE) - - ## store nn_network dist - gobject <- set_multiomics(gobject = gobject, - result = nn_network$dist, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = 'WNN', - result_name = 'integrated_kNN_dist', - verbose = TRUE) - } - - ######################### Calculate integrated UMAP ############################ - - print('Calculating integrated UMAP') - - nn_network_id = get_multiomics(gobject, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = integration_method, - result_name = 'integrated_kNN_id') - - nn_network_dist = get_multiomics(gobject, - spat_unit = spat_unit, - feat_type = integrated_feat_type, - integration_method = integration_method, - result_name = 'integrated_kNN_dist') - - - #### using nn_network pre-calculation - set.seed(4567) - integrated_umap <- uwot::umap(X = theta_weighted, - n_neighbors = k, - nn_method = list(idx = nn_network_id, - dist = nn_network_dist), - spread = spread, - min_dist = min_dist, - ...) - - colnames(integrated_umap) <- c("Dim.1", "Dim.2") - - ## add umap - gobject@dimension_reduction$cells[[spat_unit]][[modality1]][["umap"]][["integrated.umap"]] <- list(name = "integrated.umap", - feat_type = modality1, - spat_unit = spat_unit, - reduction_method = "umap", - coordinates = integrated_umap, - misc = NULL) - - gobject@dimension_reduction$cells[[spat_unit]][[modality2]][["umap"]][["integrated.umap"]] <- list(name = "integrated.umap", - feat_type = modality2, - spat_unit = spat_unit, - reduction_method = "umap", - coordinates = integrated_umap, - misc = NULL) - - return(gobject) -} + spat_unit = "cell", + modality1 = "rna", + modality2 = "protein", + integrated_feat_type = NULL, + integration_method = "WNN", + matrix_result_name = "theta_weighted_matrix", + k = 20, + spread = 5, + min_dist = 0.01, + force = FALSE, + ...) { + if (is.null(integrated_feat_type)) { + integrated_feat_type <- paste0(modality1, "_", modality2) + } + + theta_weighted <- get_multiomics(gobject, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = integration_method, + result_name = matrix_result_name + ) + + theta_weighted[is.na(theta_weighted)] <- 0 + + if (is.null(gobject@nn_network[[spat_unit]][[ + modality1]]$kNN$integrated_kNN) || force == TRUE) { + ################# Calculate integrated Nearest Neighbors ############### + + message("Calculating integrated Nearest Neighbors") + + cell_names <- colnames(theta_weighted) + + nn_network <- dbscan::kNN(x = theta_weighted, k = k, sort = TRUE) + from <- to <- weight <- distance <- from_cell_ID <- to_cell_ID <- + shared <- NULL + nn_network_dt <- data.table::data.table( + from = rep( + 1:nrow(nn_network$id), + k + ), + to = as.vector(nn_network$id), + weight = 1 / (1 + as.vector(nn_network$dist)), + distance = as.vector(nn_network$dist) + ) + nn_network_dt[, `:=`(from_cell_ID, cell_names[from])] + nn_network_dt[, `:=`(to_cell_ID, cell_names[to])] + all_index <- unique( + x = c(nn_network_dt$from_cell_ID, nn_network_dt$to_cell_ID)) + + ################################ Create igraph ######################### + + message("Creating igraph") + + nn_network_igraph <- igraph::graph_from_data_frame( + nn_network_dt[, .(from_cell_ID, to_cell_ID, weight, distance)], + directed = TRUE, + vertices = all_index + ) + + ## store igraph + nnNetObj <- create_nn_net_obj( + name = "integrated_kNN", + nn_type = "kNN", + igraph = nn_network_igraph, + spat_unit = spat_unit, + feat_type = modality1 + ) + + gobject <- set_NearestNetwork( + gobject = gobject, + nn_network = nnNetObj, + spat_unit = spat_unit, + feat_type = modality1, + nn_network_to_use = "kNN", + network_name = "integrated_kNN" + ) + + ## store nn_network id + gobject <- set_multiomics( + gobject = gobject, + result = nn_network$id, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = "WNN", + result_name = "integrated_kNN_id", + verbose = TRUE + ) + + ## store nn_network dist + gobject <- set_multiomics( + gobject = gobject, + result = nn_network$dist, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = "WNN", + result_name = "integrated_kNN_dist", + verbose = TRUE + ) + } + ######################### Calculate integrated UMAP ######################## + + message("Calculating integrated UMAP") + + nn_network_id <- getMultiomics(gobject, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = integration_method, + result_name = "integrated_kNN_id" + ) + + nn_network_dist <- getMultiomics(gobject, + spat_unit = spat_unit, + feat_type = integrated_feat_type, + integration_method = integration_method, + result_name = "integrated_kNN_dist" + ) + + + #### using nn_network pre-calculation + set.seed(4567) + integrated_umap <- uwot::umap( + X = theta_weighted, + n_neighbors = k, + nn_method = list( + idx = nn_network_id, + dist = nn_network_dist + ), + spread = spread, + min_dist = min_dist, + ... + ) + + colnames(integrated_umap) <- c("Dim.1", "Dim.2") + + ## add umap + gobject@dimension_reduction$cells[[spat_unit]][[modality1]][["umap"]][[ + "integrated.umap"]] <- list( + name = "integrated.umap", + feat_type = modality1, + spat_unit = spat_unit, + reduction_method = "umap", + coordinates = integrated_umap, + misc = NULL + ) + + gobject@dimension_reduction$cells[[spat_unit]][[modality2]][["umap"]][[ + "integrated.umap"]] <- list( + name = "integrated.umap", + feat_type = modality2, + spat_unit = spat_unit, + reduction_method = "umap", + coordinates = integrated_umap, + misc = NULL + ) + + return(gobject) +} diff --git a/R/zzz.R b/R/zzz.R index 4fcdde67c..d15b5883c 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -1,32 +1,22 @@ # Run on library loading -.onAttach = function(libname, pkgname) { - - - ## print version number ## - packageStartupMessage('Giotto Suite ', utils::packageVersion('Giotto')) - - check_ver = getOption('giotto.check_version', TRUE) - if(isTRUE(check_ver)) { - check_github_suite_ver() - options('giotto.check_version' = FALSE) - } - - - ## cores detection ## - check_core = getOption('giotto.check_core', TRUE) - if(isTRUE(check_core)) { - cores = determine_cores(cores = NA) - data.table::setDTthreads(threads = cores) - options('giotto.check_core' = FALSE) - } - - +.onAttach <- function(libname, pkgname) { + ## print version number ## + packageStartupMessage("Giotto Suite ", utils::packageVersion("Giotto")) + + check_ver <- getOption("giotto.check_version", TRUE) + if (isTRUE(check_ver)) { + check_github_suite_ver() + options("giotto.check_version" = FALSE) + } + + + ## cores detection ## + check_core <- getOption("giotto.check_core", TRUE) + if (isTRUE(check_core)) { + cores <- determine_cores(cores = NA) + data.table::setDTthreads(threads = cores) + options("giotto.check_core" = FALSE) + } } - - - - - - diff --git a/README.Rmd b/README.Rmd index b49fefdbd..7a649574f 100644 --- a/README.Rmd +++ b/README.Rmd @@ -7,9 +7,9 @@ output: github_document ```{r, echo = FALSE} knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "man/figures/" + collapse = TRUE, + comment = "#>", + fig.path = "man/figures/" ) ``` diff --git a/man/PAGEEnrich.Rd b/man/PAGEEnrich.Rd index 628fabc04..2a0bbff07 100644 --- a/man/PAGEEnrich.Rd +++ b/man/PAGEEnrich.Rd @@ -15,7 +15,8 @@ PAGEEnrich(...) \item{\code{feat_type}}{feature type} \item{\code{sign_matrix}}{Matrix of signature genes for each cell type / process} \item{\code{expression_values}}{expression values to use} - \item{\code{min_overlap_genes}}{minimum number of overlapping genes in sign_matrix required to calculate enrichment} + \item{\code{min_overlap_genes}}{minimum number of overlapping genes in sign_matrix +required to calculate enrichment} \item{\code{reverse_log_scale}}{reverse expression values from log scale} \item{\code{logbase}}{log base to use if reverse_log_scale = TRUE} \item{\code{output_enrichment}}{how to return enrichment output} @@ -29,7 +30,8 @@ PAGEEnrich(...) }} } \description{ -Function to calculate gene signature enrichment scores per spatial position using PAGE. +Function to calculate gene signature enrichment scores per +spatial position using PAGE. } \seealso{ \code{\link{runPAGEEnrich}} diff --git a/man/adapt_aspect_ratio.Rd b/man/adapt_aspect_ratio.Rd index 84b13250d..ae3981116 100644 --- a/man/adapt_aspect_ratio.Rd +++ b/man/adapt_aspect_ratio.Rd @@ -27,6 +27,7 @@ adapt_aspect_ratio( \item{mesh_obj}{mesh_obj} } \description{ -adapt the aspact ratio after inserting cross section mesh grid lines +adapt the aspact ratio after inserting cross section mesh grid +lines } \keyword{internal} diff --git a/man/addCellIntMetadata.Rd b/man/addCellIntMetadata.Rd index c23166f7e..8ae0a148c 100644 --- a/man/addCellIntMetadata.Rd +++ b/man/addCellIntMetadata.Rd @@ -36,12 +36,16 @@ addCellIntMetadata( Giotto object } \description{ -Creates an additional metadata column with information about interacting and non-interacting cell types of the +Creates an additional metadata column with information about +interacting and non-interacting cell types of the selected cell-cell interaction. } \details{ -This function will create an additional metadata column which selects interacting cell types for a specific cell-cell -interaction. For example, if you want to color interacting astrocytes and oligodendrocytes it will create a new metadata column with -the values "select_astrocytes", "select_oligodendrocytes", "other_astrocytes", "other_oligodendroyctes" and "other". Where "other" is all -other cell types found within the selected cell type column. +This function will create an additional metadata column which +selects interacting cell types for a specific cell-cell +interaction. For example, if you want to color interacting astrocytes and +oligodendrocytes it will create a new metadata column with +the values "select_astrocytes", "select_oligodendrocytes", +"other_astrocytes", "other_oligodendroyctes" and "other". Where "other" is +all other cell types found within the selected cell type column. } diff --git a/man/addFeatsPerc.Rd b/man/addFeatsPerc.Rd index b30596bf9..456d27ad0 100644 --- a/man/addFeatsPerc.Rd +++ b/man/addFeatsPerc.Rd @@ -30,8 +30,10 @@ addFeatsPerc( \item{return_gobject}{boolean: return giotto object (default = TRUE)} } \value{ -giotto object if \code{return_gobject = TRUE}, else a vector with % results +giotto object if \code{return_gobject = TRUE}, else a vector with % +results } \description{ -Calculates the total percentage of (normalized) counts for a subset of selected genes +Calculates the total percentage of (normalized) counts for a +subset of selected genes } diff --git a/man/addHMRF_V2.Rd b/man/addHMRF_V2.Rd index 031809eda..d496196f0 100644 --- a/man/addHMRF_V2.Rd +++ b/man/addHMRF_V2.Rd @@ -17,5 +17,8 @@ addHMRF_V2(gobject, HMRFoutput, name = "hmrf") function to add HMRF Domain Type to cell meta data } \details{ -This function appends HMRF domain clusters to corresponding cell meta data for all the beta values, with the given HMRF model names. For example, if name = ‘hmrf1’ and name of result in HMRFoutput is ‘k=8 b=0.00’, the appended cell meta data column will be named with ‘hmrf1 k=8 b=0.00’ +This function appends HMRF domain clusters to corresponding cell meta data +for all the beta values, with the given HMRF model names. For example, if +name = ‘hmrf1’ and name of result in HMRFoutput is ‘k=8 b=0.00’, the +appended cell meta data column will be named with ‘hmrf1 k=8 b=0.00’ } diff --git a/man/addPolygonCells.Rd b/man/addPolygonCells.Rd index 594d94bc8..772db44b3 100644 --- a/man/addPolygonCells.Rd +++ b/man/addPolygonCells.Rd @@ -25,28 +25,31 @@ addPolygonCells( \item{feat_type}{feature name where metadata will be added} -\item{polygons}{polygon names to plot (e.g. 'polygon_1'). If NULL, plots all available polygons} +\item{polygons}{polygon names to plot (e.g. 'polygon_1'). If NULL, plots +all available polygons} -\item{na.label}{polygon label for cells located outside of polygons area. Default = "no_polygon"} +\item{na.label}{polygon label for cells located outside of polygons area. +Default = "no_polygon"} } \value{ A Giotto object with a modified cell_metadata slot that includes the -polygon name where each cell is located or no_polygon label if the cell is not located -within a polygon area +polygon name where each cell is located or no_polygon label if the cell is +not located within a polygon area } \description{ Add corresponding polygon IDs to cell metadata } \examples{ - \dontrun{ ## Plot interactive polygons my_polygon_coords <- plotInteractivePolygons(my_spatPlot) ## Add polygon coordinates to Giotto object my_giotto_polygons <- createGiottoPolygonsFromDfr(my_polygon_coords) -my_giotto_object <- addGiottoPolygons(gobject = my_giotto_object, - gpolygons = list(my_giotto_polygons)) +my_giotto_object <- addGiottoPolygons( + gobject = my_giotto_object, + gpolygons = list(my_giotto_polygons) +) ## Add polygon IDs to cell metadata my_giotto_object <- addPolygonCells(my_giotto_object) diff --git a/man/adjustGiottoMatrix.Rd b/man/adjustGiottoMatrix.Rd index 7afa89a65..58f63ba7f 100644 --- a/man/adjustGiottoMatrix.Rd +++ b/man/adjustGiottoMatrix.Rd @@ -24,9 +24,11 @@ adjustGiottoMatrix( \item{expression_values}{expression values to use} -\item{batch_columns}{metadata columns that represent different batch (max = 2)} +\item{batch_columns}{metadata columns that represent different +batch (max = 2)} -\item{covariate_columns}{metadata columns that represent covariates to regress out} +\item{covariate_columns}{metadata columns that represent covariates to +regress out} \item{return_gobject}{boolean: return giotto object (default = TRUE)} @@ -36,9 +38,11 @@ adjustGiottoMatrix( giotto object or exprObj } \description{ -Adjust expression values to account for known batch effects or technological covariates. +Adjust expression values to account for known batch effects or +technological covariates. } \details{ -This function implements the \code{\link[limma]{removeBatchEffect}} function to -remove known batch effects and to adjust expression values according to provided covariates. +This function implements the \code{\link[limma]{removeBatchEffect}} +function to remove known batch effects and to adjust expression values +according to provided covariates. } diff --git a/man/binSpect.Rd b/man/binSpect.Rd index 70f227cf9..1d0f72710 100644 --- a/man/binSpect.Rd +++ b/man/binSpect.Rd @@ -55,13 +55,15 @@ binSpect( \item{subset_genes}{deprecated, use subset_feats} -\item{spatial_network_name}{name of spatial network to use (default = 'spatial_network')} +\item{spatial_network_name}{name of spatial network to use +(default = 'spatial_network')} \item{spatial_network_k}{different k's for a spatial kNN to evaluate} \item{reduce_network}{default uses the full network} -\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset)} +\item{kmeans_algo}{kmeans algorithm to use +(kmeans, kmeans_arma, kmeans_arma_subset)} \item{nstart}{kmeans: nstart parameter} @@ -75,19 +77,22 @@ binSpect( \item{do_fisher_test}{perform fisher test} -\item{adjust_method}{p-value adjusted method to use (see \code{\link[stats]{p.adjust}})} +\item{adjust_method}{p-value adjusted method to use +(see \code{\link[stats]{p.adjust}})} \item{calc_hub}{calculate the number of hub cells} \item{hub_min_int}{minimum number of cell-cell interactions for a hub cell} -\item{get_av_expr}{calculate the average expression per gene of the high expressing cells} +\item{get_av_expr}{calculate the average expression per gene of the high +expressing cells} \item{get_high_expr}{calculate the number of high expressing cells per gene} \item{implementation}{enrichment implementation (data.table, simple, matrix)} -\item{group_size}{number of genes to process together with data.table implementation (default = automatic)} +\item{group_size}{number of genes to process together with data.table +implementation (default = automatic)} \item{do_parallel}{run calculations in parallel with mclapply} @@ -102,7 +107,8 @@ binSpect( \item{seed}{seed for kmeans binarization. When \code{NULL}, no seed is set. Otherwise, accepts a numeric input that will be used as seed.} -\item{bin_matrix}{a binarized matrix, when provided it will skip the binarization process} +\item{bin_matrix}{a binarized matrix, when provided it will skip the +binarization process} \item{summarize}{summarize the p-values or adjusted p-values} @@ -113,12 +119,14 @@ separately (default)} data.table with results (see details) } \description{ -Previously: binGetSpatialGenes. BinSpect (Binary Spatial Extraction of genes) is a fast computational method +Previously: binGetSpatialGenes. BinSpect +(Binary Spatial Extraction of genes) is a fast computational method that identifies genes with a spatially coherent expression pattern. } \details{ -We provide two ways to identify spatial genes based on gene expression binarization. -Both methods are identicial except for how binarization is performed. +We provide two ways to identify spatial genes based on gene +expression binarization. +Both methods are identical except for how binarization is performed. \itemize{ \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} \item{2. network: }{Alll cells are connected through a spatial network based on the physical coordinates} @@ -138,7 +146,10 @@ Other statistics are provided (optional): \item{Average expression of each gene within high expressing cells } \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} } -By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +By selecting a subset of likely spatial genes +(e.g. soft thresholding highly variable genes) can accelerate the speed. +The simple implementation is usually faster, but lacks the possibility to +run in parallel and to calculate hub cells. +The data.table implementation might be more appropriate for large datasets +by setting the group_size (number of genes) parameter to divide the workload. } diff --git a/man/binSpectMulti.Rd b/man/binSpectMulti.Rd index 30ee7fa81..fba6cf53f 100644 --- a/man/binSpectMulti.Rd +++ b/man/binSpectMulti.Rd @@ -56,7 +56,8 @@ binSpectMulti( \item{reduce_network}{default uses the full network} -\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset)} +\item{kmeans_algo}{kmeans algorithm to use +(kmeans, kmeans_arma, kmeans_arma_subset)} \item{nstart}{kmeans: nstart parameter} @@ -70,19 +71,22 @@ binSpectMulti( \item{do_fisher_test}{perform fisher test} -\item{adjust_method}{p-value adjusted method to use (see \code{\link[stats]{p.adjust}})} +\item{adjust_method}{p-value adjusted method to use +(see \code{\link[stats]{p.adjust}})} \item{calc_hub}{calculate the number of hub cells} \item{hub_min_int}{minimum number of cell-cell interactions for a hub cell} -\item{get_av_expr}{calculate the average expression per gene of the high expressing cells} +\item{get_av_expr}{calculate the average expression per gene of the high +expressing cells} \item{get_high_expr}{calculate the number of high expressing cells per gene} \item{implementation}{enrichment implementation (data.table, simple, matrix)} -\item{group_size}{number of genes to process together with data.table implementation (default = automatic)} +\item{group_size}{number of genes to process together with data.table +implementation (default = automatic)} \item{do_parallel}{run calculations in parallel with mclapply} @@ -106,8 +110,9 @@ data.table with results (see details) binSpect for multiple spatial kNN networks } \details{ -We provide two ways to identify spatial genes based on gene expression binarization. -Both methods are identicial except for how binarization is performed. +We provide two ways to identify spatial genes based on gene +expression binarization. +Both methods are identical except for how binarization is performed. \itemize{ \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} \item{2. network: }{Alll cells are connected through a spatial network based on the physical coordinates} @@ -127,7 +132,10 @@ Other statistics are provided (optional): \item{Average expression of each gene within high expressing cells } \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} } -By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +By selecting a subset of likely spatial genes +(e.g. soft thresholding highly variable genes) can accelerate the speed. +The simple implementation is usually faster, but lacks the possibility to +run in parallel and to calculate hub cells. +The data.table implementation might be more appropriate for large datasets +by setting the group_size (number of genes) parameter to divide the workload. } diff --git a/man/binSpectMultiMatrix.Rd b/man/binSpectMultiMatrix.Rd index 93a2f82db..ebc47b6c5 100644 --- a/man/binSpectMultiMatrix.Rd +++ b/man/binSpectMultiMatrix.Rd @@ -25,7 +25,7 @@ binSpectMultiMatrix( group_size = "automatic", do_parallel = TRUE, cores = NA, - verbose = T, + verbose = TRUE, knn_params = NULL, set.seed = deprecated(), seed = 1234, @@ -41,7 +41,8 @@ binSpectMultiMatrix( \item{subset_feats}{only select a subset of features to test} -\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset)} +\item{kmeans_algo}{kmeans algorithm to use +(kmeans, kmeans_arma, kmeans_arma_subset)} \item{nstart}{kmeans: nstart parameter} @@ -55,19 +56,22 @@ binSpectMultiMatrix( \item{do_fisher_test}{perform fisher test} -\item{adjust_method}{p-value adjusted method to use (see \code{\link[stats]{p.adjust}})} +\item{adjust_method}{p-value adjusted method to use +(see \code{\link[stats]{p.adjust}})} \item{calc_hub}{calculate the number of hub cells} \item{hub_min_int}{minimum number of cell-cell interactions for a hub cell} -\item{get_av_expr}{calculate the average expression per gene of the high expressing cells} +\item{get_av_expr}{calculate the average expression per gene of the high +expressing cells} \item{get_high_expr}{calculate the number of high expressing cells per gene} \item{implementation}{enrichment implementation (data.table, simple, matrix)} -\item{group_size}{number of genes to process together with data.table implementation (default = automatic)} +\item{group_size}{number of genes to process together with data.table +implementation (default = automatic)} \item{do_parallel}{run calculations in parallel with mclapply} @@ -88,5 +92,6 @@ is set.} data.table with results } \description{ -binSpect for a single spatial network and a provided expression matrix +binSpect for a single spatial network and a provided +expression matrix } diff --git a/man/binSpectSingle.Rd b/man/binSpectSingle.Rd index 1f0cd11ad..a146976b3 100644 --- a/man/binSpectSingle.Rd +++ b/man/binSpectSingle.Rd @@ -51,11 +51,13 @@ binSpectSingle( \item{subset_genes}{deprecated, use subset_feats} -\item{spatial_network_name}{name of spatial network to use (default = 'spatial_network')} +\item{spatial_network_name}{name of spatial network to use +(default = 'spatial_network')} \item{reduce_network}{default uses the full network} -\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset)} +\item{kmeans_algo}{kmeans algorithm to use +(kmeans, kmeans_arma, kmeans_arma_subset)} \item{nstart}{kmeans: nstart parameter} @@ -69,19 +71,22 @@ binSpectSingle( \item{do_fisher_test}{perform fisher test} -\item{adjust_method}{p-value adjusted method to use (see \code{\link[stats]{p.adjust}})} +\item{adjust_method}{p-value adjusted method to use +(see \code{\link[stats]{p.adjust}})} \item{calc_hub}{calculate the number of hub cells} \item{hub_min_int}{minimum number of cell-cell interactions for a hub cell} -\item{get_av_expr}{calculate the average expression per gene of the high expressing cells} +\item{get_av_expr}{calculate the average expression per gene of the high +expressing cells} \item{get_high_expr}{calculate the number of high expressing cells per gene} \item{implementation}{enrichment implementation (data.table, simple, matrix)} -\item{group_size}{number of genes to process together with data.table implementation (default = automatic)} +\item{group_size}{number of genes to process together with data.table +implementation (default = automatic)} \item{do_parallel}{run calculations in parallel with mclapply} @@ -94,7 +99,8 @@ binSpectSingle( \item{seed}{Seed for kmeans binarization. If NULL passed, no seed is set. Otherwise, the input value is used as seed.} -\item{bin_matrix}{a binarized matrix, when provided it will skip the binarization process} +\item{bin_matrix}{a binarized matrix, when provided it will skip the +binarization process} } \value{ data.table with results (see details) @@ -103,7 +109,8 @@ data.table with results (see details) binSpect for a single spatial network } \details{ -We provide two ways to identify spatial genes based on gene expression binarization. +We provide two ways to identify spatial genes based on gene +expression binarization. Both methods are identicial except for how binarization is performed. \itemize{ \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} @@ -124,7 +131,10 @@ Other statistics are provided (optional): \item{Average expression of each gene within high expressing cells } \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} } -By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +By selecting a subset of likely spatial genes +(e.g. soft thresholding highly variable genes) can accelerate the speed. +The simple implementation is usually faster, but lacks the possibility to +run in parallel and to calculate hub cells. +The data.table implementation might be more appropriate for large datasets +by setting the group_size (number of genes) parameter to divide the workload. } diff --git a/man/binSpectSingleMatrix.Rd b/man/binSpectSingleMatrix.Rd index 6dd9b9583..9890aa08c 100644 --- a/man/binSpectSingleMatrix.Rd +++ b/man/binSpectSingleMatrix.Rd @@ -36,13 +36,15 @@ binSpectSingleMatrix( \item{spatial_network}{spatial network in data.table format} -\item{bin_matrix}{a binarized matrix, when provided it will skip the binarization process} +\item{bin_matrix}{a binarized matrix, when provided it will skip the +binarization process} \item{bin_method}{method to binarize gene expression} \item{subset_feats}{only select a subset of features to test} -\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, kmeans_arma_subset)} +\item{kmeans_algo}{kmeans algorithm to use (kmeans, kmeans_arma, +kmeans_arma_subset)} \item{nstart}{kmeans: nstart parameter} @@ -56,19 +58,22 @@ binSpectSingleMatrix( \item{do_fisher_test}{perform fisher test} -\item{adjust_method}{p-value adjusted method to use (see \code{\link[stats]{p.adjust}})} +\item{adjust_method}{p-value adjusted method to use +(see \code{\link[stats]{p.adjust}})} \item{calc_hub}{calculate the number of hub cells} \item{hub_min_int}{minimum number of cell-cell interactions for a hub cell} -\item{get_av_expr}{calculate the average expression per gene of the high expressing cells} +\item{get_av_expr}{calculate the average expression per gene of the high +expressing cells} \item{get_high_expr}{calculate the number of high expressing cells per gene} \item{implementation}{enrichment implementation (data.table, simple, matrix)} -\item{group_size}{number of genes to process together with data.table implementation (default = automatic)} +\item{group_size}{number of genes to process together with data.table +implementation (default = automatic)} \item{do_parallel}{run calculations in parallel with mclapply} @@ -85,10 +90,12 @@ Otherwise, the input value is used as seed.} data.table with results (see details) } \description{ -binSpect for a single spatial network and a provided expression matrix +binSpect for a single spatial network and a provided +expression matrix } \details{ -We provide two ways to identify spatial genes based on gene expression binarization. +We provide two ways to identify spatial genes based on gene +expression binarization. Both methods are identicial except for how binarization is performed. \itemize{ \item{1. binarize: }{Each gene is binarized (0 or 1) in each cell with \bold{kmeans} (k = 2) or based on \bold{rank} percentile} @@ -109,7 +116,10 @@ Other statistics are provided (optional): \item{Average expression of each gene within high expressing cells } \item{Number of hub cells, these are high expressing cells that have a user defined number of high expressing neighbors} } -By selecting a subset of likely spatial genes (e.g. soft thresholding highly variable genes) can accelerate the speed. -The simple implementation is usually faster, but lacks the possibility to run in parallel and to calculate hub cells. -The data.table implementation might be more appropriate for large datasets by setting the group_size (number of genes) parameter to divide the workload. +By selecting a subset of likely spatial genes (e.g. soft thresholding +highly variable genes) can accelerate the speed. +The simple implementation is usually faster, but lacks the possibility to +run in parallel and to calculate hub cells. +The data.table implementation might be more appropriate for large datasets +by setting the group_size (number of genes) parameter to divide the workload. } diff --git a/man/calculateHVF.Rd b/man/calculateHVF.Rd index 7b08086ab..7f7a2bb1d 100644 --- a/man/calculateHVF.Rd +++ b/man/calculateHVF.Rd @@ -41,26 +41,31 @@ calculateHVF( \item{method}{method to calculate highly variable features} -\item{reverse_log_scale}{reverse log-scale of expression values (default = FALSE)} +\item{reverse_log_scale}{reverse log-scale of expression values +(default = FALSE)} \item{logbase}{if \code{reverse_log_scale} is TRUE, which log base was used?} \item{expression_threshold}{expression threshold to consider a gene detected} -\item{nr_expression_groups}{(cov_groups) number of expression groups for cov_groups} +\item{nr_expression_groups}{(cov_groups) number of expression groups for +cov_groups} \item{zscore_threshold}{(cov_groups) zscore to select hvg for cov_groups} \item{HVFname}{name for highly variable features in cell metadata} -\item{difference_in_cov}{(cov_loess) minimum difference in coefficient of variance required} +\item{difference_in_cov}{(cov_loess) minimum difference in coefficient of +variance required} -\item{var_threshold}{(var_p_resid) variance threshold for features for var_p_resid method} +\item{var_threshold}{(var_p_resid) variance threshold for features for +var_p_resid method} -\item{var_number}{(var_p_resid) number of top variance features for var_p_resid method} +\item{var_number}{(var_p_resid) number of top variance features for +var_p_resid method} -\item{random_subset}{random subset to perform HVF detection on. Passing \code{NULL} -runs HVF on all cells.} +\item{random_subset}{random subset to perform HVF detection on. +Passing \code{NULL} runs HVF on all cells.} \item{set_seed}{logical. whether to set a seed when random_subset is used} @@ -72,14 +77,17 @@ runs HVF on all cells.} \item{save_plot}{logical. directly save the plot} -\item{save_param}{list of saving parameters from \code{\link[GiottoVisuals:plot_save]{GiottoVisuals::all_plots_save_function()}}} +\item{save_param}{list of saving parameters from +\code{\link[GiottoVisuals:plot_save]{GiottoVisuals::all_plots_save_function()}}} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, change +save_name in save_param} \item{return_gobject}{boolean: return giotto object (default = TRUE)} } \value{ -giotto object highly variable features appended to feature metadata (\code{fDataDT()}) +giotto object highly variable features appended to feature metadata +(\code{fDataDT()}) } \description{ compute highly variable features @@ -88,11 +96,14 @@ compute highly variable features Currently we provide 2 ways to calculate highly variable genes: \strong{1. high coeff of variance (COV) within groups: } \cr -First genes are binned (\emph{nr_expression_groups}) into average expression groups and -the COV for each feature is converted into a z-score within each bin. Features with a z-score -higher than the threshold (\emph{zscore_threshold}) are considered highly variable. \cr +First genes are binned (\emph{nr_expression_groups}) into average expression +groups and the COV for each feature is converted into a z-score within each +bin. Features with a z-score higher than the threshold +(\emph{zscore_threshold}) are considered highly variable. \cr \strong{2. high COV based on loess regression prediction: } \cr -A predicted COV is calculated for each feature using loess regression (COV~log(mean expression)) -Features that show a higher than predicted COV (\emph{difference_in_cov}) are considered highly variable. \cr +A predicted COV is calculated for each feature using loess regression +(COV~log(mean expression)) +Features that show a higher than predicted COV (\emph{difference_in_cov}) +are considered highly variable. \cr } diff --git a/man/calculate_spatial_enrichment.Rd b/man/calculate_spatial_enrichment.Rd index 9a0c1540b..36d73d74b 100644 --- a/man/calculate_spatial_enrichment.Rd +++ b/man/calculate_spatial_enrichment.Rd @@ -42,11 +42,14 @@ Calculate spatial enrichment. Multiple methods are provided. } \section{Functions}{ \itemize{ -\item \code{.calc_spatial_enrichment_minimum()}: calculate using a 'simple' and efficient for loop +\item \code{.calc_spatial_enrichment_minimum()}: calculate using a 'simple' and +efficient for loop -\item \code{.calc_spatial_enrichment_matrix()}: calculate using 'matrix' implementation +\item \code{.calc_spatial_enrichment_matrix()}: calculate using 'matrix' +implementation -\item \code{.calc_spatial_enrichment_dt()}: calculate using 'data.table' implementation +\item \code{.calc_spatial_enrichment_dt()}: calculate using 'data.table' +implementation }} \keyword{internal} diff --git a/man/cellProximityEnrichment.Rd b/man/cellProximityEnrichment.Rd index d417587eb..4918e5744 100644 --- a/man/cellProximityEnrichment.Rd +++ b/man/cellProximityEnrichment.Rd @@ -28,7 +28,8 @@ cellProximityEnrichment( \item{cluster_column}{name of column to use for clusters} -\item{number_of_simulations}{number of simulations to create expected observations} +\item{number_of_simulations}{number of simulations to create expected +observations} \item{adjust_method}{method to adjust p.values} @@ -37,18 +38,19 @@ cellProximityEnrichment( \item{seed_number}{seed number to use} } \value{ -List of cell Proximity scores (CPscores) in data.table format. The first -data.table (raw_sim_table) shows the raw observations of both the original and -simulated networks. The second data.table (enrichm_res) shows the enrichment results. +List of cell Proximity scores (CPscores) in data.table format. The +first data.table (raw_sim_table) shows the raw observations of both the +original and simulated networks. The second data.table (enrichm_res) shows +the enrichment results. } \description{ Compute cell-cell interaction enrichment (observed vs expected) } \details{ -Spatial proximity enrichment or depletion between pairs of cell types -is calculated by calculating the observed over the expected frequency -of cell-cell proximity interactions. The expected frequency is the average frequency -calculated from a number of spatial network simulations. Each individual simulation is -obtained by reshuffling the cell type labels of each node (cell) -in the spatial network. +Spatial proximity enrichment or depletion between pairs of cell +types is calculated by calculating the observed over the expected frequency +of cell-cell proximity interactions. The expected frequency is the average +frequency calculated from a number of spatial network simulations. Each +individual simulation is obtained by reshuffling the cell type labels of +each node (cell) in the spatial network. } diff --git a/man/cellProximityEnrichmentEachSpot.Rd b/man/cellProximityEnrichmentEachSpot.Rd index 4bdbae09f..a43e423bb 100644 --- a/man/cellProximityEnrichmentEachSpot.Rd +++ b/man/cellProximityEnrichmentEachSpot.Rd @@ -24,7 +24,8 @@ cellProximityEnrichmentEachSpot( \item{cluster_column}{name of column to use for clusters} } \value{ -matrix that rownames are cell-cell interaction pairs and colnames are cell_IDs +matrix that rownames are cell-cell interaction pairs and colnames +are cell_IDs } \description{ Compute cell-cell interaction enrichment for each spot with its diff --git a/man/cellProximityEnrichmentSpots.Rd b/man/cellProximityEnrichmentSpots.Rd index 3f60a5db0..1fce9e9b1 100644 --- a/man/cellProximityEnrichmentSpots.Rd +++ b/man/cellProximityEnrichmentSpots.Rd @@ -32,9 +32,11 @@ cellProximityEnrichmentSpots( \item{cells_in_spot}{cell number in each spot} -\item{number_of_simulations}{number of simulations to create expected observations} +\item{number_of_simulations}{number of simulations to create expected +observations} -\item{adjust_method}{method to adjust p.values (e.g. "none", "fdr", "bonferroni","BH","holm", "hochberg", "hommel","BY")} +\item{adjust_method}{method to adjust p.values +(e.g. "none", "fdr", "bonferroni","BH","holm", "hochberg", "hommel","BY")} \item{set_seed}{use of seed. Default = TRUE} @@ -43,18 +45,21 @@ cellProximityEnrichmentSpots( \item{verbose}{be verbose} } \value{ -List of cell Proximity scores (CPscores) in data.table format. The first -data.table (raw_sim_table) shows the raw observations of both the original and -simulated networks. The second data.table (enrichm_res) shows the enrichment results. +List of cell Proximity scores (CPscores) in data.table format. +The first +data.table (raw_sim_table) shows the raw observations of both the original +and simulated networks. The second data.table (enrichm_res) shows the +enrichment results. } \description{ -Compute cell-cell interaction enrichment for spots (observed vs expected) +Compute cell-cell interaction enrichment for spots +(observed vs expected) } \details{ -Spatial proximity enrichment or depletion between pairs of cell types -is calculated by calculating the observed over the expected frequency -of cell-cell proximity interactions. The expected frequency is the average frequency -calculated from a number of spatial network simulations. Each individual simulation is -obtained by reshuffling the cell type labels of each node (spot) -in the spatial network. +Spatial proximity enrichment or depletion between pairs of cell +types is calculated by calculating the observed over the expected frequency +of cell-cell proximity interactions. The expected frequency is the average +frequency calculated from a number of spatial network simulations. Each +individual simulation is obtained by reshuffling the cell type labels of +each node (spot) in the spatial network. } diff --git a/man/cellProximityHeatmap.Rd b/man/cellProximityHeatmap.Rd index b7f5f8ef2..61656a630 100644 --- a/man/cellProximityHeatmap.Rd +++ b/man/cellProximityHeatmap.Rd @@ -7,8 +7,8 @@ cellProximityHeatmap( gobject, CPscore, - scale = T, - order_cell_types = T, + scale = TRUE, + order_cell_types = TRUE, color_breaks = NULL, color_names = NULL, show_plot = NULL, @@ -27,7 +27,8 @@ cellProximityHeatmap( \item{order_cell_types}{order cell types based on enrichment correlation} -\item{color_breaks}{numerical vector of length 3 to represent min, mean and maximum} +\item{color_breaks}{numerical vector of length 3 to represent min, mean +and maximum} \item{color_names}{character color vector of length 3} diff --git a/man/cellProximityNetwork.Rd b/man/cellProximityNetwork.Rd index 1b3855fd4..233eb81f3 100644 --- a/man/cellProximityNetwork.Rd +++ b/man/cellProximityNetwork.Rd @@ -15,7 +15,7 @@ cellProximityNetwork( edge_weight_range_depletion = c(0.1, 1), edge_weight_range_enrichment = c(1, 5), layout = c("Fruchterman", "DrL", "Kamada-Kawai"), - only_show_enrichment_edges = F, + only_show_enrichment_edges = FALSE, edge_width_range = c(0.1, 2), node_size = 4, node_color_code = NULL, @@ -42,9 +42,11 @@ cellProximityNetwork( \item{rescale_edge_weights}{rescale edge weights (boolean)} -\item{edge_weight_range_depletion}{numerical vector of length 2 to rescale depleted edge weights} +\item{edge_weight_range_depletion}{numerical vector of length 2 to rescale +depleted edge weights} -\item{edge_weight_range_enrichment}{numerical vector of length 2 to rescale enriched edge weights} +\item{edge_weight_range_enrichment}{numerical vector of length 2 to rescale +enriched edge weights} \item{layout}{layout algorithm to use to draw nodes and edges} diff --git a/man/cellProximitySpatPlot.Rd b/man/cellProximitySpatPlot.Rd index 9bab1a8af..8b5875886 100644 --- a/man/cellProximitySpatPlot.Rd +++ b/man/cellProximitySpatPlot.Rd @@ -53,11 +53,13 @@ named vector of colors} ggplot } \description{ -Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +Visualize 2D cell-cell interactions according to spatial +coordinates in ggplot mode } \details{ Description of parameters. } \seealso{ -\code{\link{cellProximitySpatPlot2D}} and \code{\link{cellProximitySpatPlot3D}} for 3D +\code{\link{cellProximitySpatPlot2D}} and +\code{\link{cellProximitySpatPlot3D}} for 3D } diff --git a/man/cellProximitySpatPlot2D.Rd b/man/cellProximitySpatPlot2D.Rd index a4603d8da..1840bbe2a 100644 --- a/man/cellProximitySpatPlot2D.Rd +++ b/man/cellProximitySpatPlot2D.Rd @@ -15,17 +15,17 @@ cellProximitySpatPlot2D( sdimy = "sdimy", cell_color = NULL, cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, network_color = NULL, spatial_network_name = "Delaunay_network", - show_grid = F, + show_grid = FALSE, grid_color = NULL, spatial_grid_name = "spatial_grid", coord_fix_ratio = 1, - show_legend = T, + show_legend = TRUE, point_size_select = 2, point_select_border_col = "black", point_select_border_stroke = 0.05, @@ -114,7 +114,8 @@ are used when this is TRUE. continuous colors when FALSE.} ggplot } \description{ -Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +Visualize 2D cell-cell interactions according to spatial +coordinates in ggplot mode } \details{ Description of parameters. diff --git a/man/cellProximitySpatPlot3D.Rd b/man/cellProximitySpatPlot3D.Rd index 94285aaaa..7ee4885fb 100644 --- a/man/cellProximitySpatPlot3D.Rd +++ b/man/cellProximitySpatPlot3D.Rd @@ -111,7 +111,8 @@ are used when this is TRUE. continuous colors when FALSE.} plotly } \description{ -Visualize 3D cell-cell interactions according to spatial coordinates in plotly mode +Visualize 3D cell-cell interactions according to spatial +coordinates in plotly mode } \details{ Description of parameters. diff --git a/man/cellProximityVisPlot.Rd b/man/cellProximityVisPlot.Rd index 0ed1adeb6..f42e46e53 100644 --- a/man/cellProximityVisPlot.Rd +++ b/man/cellProximityVisPlot.Rd @@ -13,17 +13,17 @@ cellProximityVisPlot( sdimz = NULL, cell_color = NULL, cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, network_color = NULL, spatial_network_name = "Delaunay_network", - show_grid = F, + show_grid = FALSE, grid_color = NULL, spatial_grid_name = "spatial_grid", coord_fix_ratio = 1, - show_legend = T, + show_legend = TRUE, point_size_select = 2, point_select_border_col = "black", point_select_border_stroke = 0.05, @@ -111,7 +111,8 @@ cellProximityVisPlot( ggplot or plotly } \description{ -Visualize cell-cell interactions according to spatial coordinates +Visualize cell-cell interactions according to spatial +coordinates } \details{ Description of parameters. diff --git a/man/cellProximityVisPlot_internals.Rd b/man/cellProximityVisPlot_internals.Rd index 30cdd0f48..c484d1ab3 100644 --- a/man/cellProximityVisPlot_internals.Rd +++ b/man/cellProximityVisPlot_internals.Rd @@ -44,16 +44,16 @@ sdimy = NULL, cell_color = NULL, cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, network_color = NULL, spatial_network_name = "Delaunay_network", - show_grid = F, + show_grid = FALSE, grid_color = NULL, spatial_grid_name = "spatial_grid", - show_legend = T, + show_legend = TRUE, point_size_select = 2, point_size_other = 1, point_alpha_other = 0.3, @@ -73,16 +73,16 @@ sdimz = NULL, cell_color = NULL, cell_color_code = NULL, - color_as_factor = T, - show_other_cells = F, - show_network = F, - show_other_network = F, + color_as_factor = TRUE, + show_other_cells = FALSE, + show_network = FALSE, + show_other_network = FALSE, network_color = NULL, spatial_network_name = "Delaunay_network", - show_grid = F, + show_grid = FALSE, grid_color = NULL, spatial_grid_name = "spatial_grid", - show_legend = T, + show_legend = TRUE, point_size_select = 2, point_size_other = 1, point_alpha_other = 0.5, @@ -99,11 +99,14 @@ Create the plots for `cellProximityVisPlot()` } \section{Functions}{ \itemize{ -\item \code{.cellProximityVisPlot_2D_ggplot()}: Visualize 2D cell-cell interactions according to spatial coordinates in ggplot mode +\item \code{.cellProximityVisPlot_2D_ggplot()}: Visualize 2D cell-cell +interactions according to spatial coordinates in ggplot mode -\item \code{.cellProximityVisPlot_2D_plotly()}: Visualize 2D cell-cell interactions according to spatial coordinates in plotly mode +\item \code{.cellProximityVisPlot_2D_plotly()}: Visualize 2D cell-cell +interactions according to spatial coordinates in plotly mode -\item \code{.cellProximityVisPlot_3D_plotly()}: Visualize 3D cell-cell interactions according to spatial coordinates in plotly mode +\item \code{.cellProximityVisPlot_3D_plotly()}: Visualize 3D cell-cell +interactions according to spatial coordinates in plotly mode }} \seealso{ diff --git a/man/cell_proximity_spots.Rd b/man/cell_proximity_spots.Rd index 14f105614..aca4dbbd8 100644 --- a/man/cell_proximity_spots.Rd +++ b/man/cell_proximity_spots.Rd @@ -16,8 +16,8 @@ \arguments{ \item{cell_IDs}{cell_IDs} -\item{dwls_values}{data.table of cell type enrichment in each spot and multiply -by cell number in each spot} +\item{dwls_values}{data.table of cell type enrichment in each spot and +multiply by cell number in each spot} \item{pairs}{data.table of paired spots. Format: cell_ID1, cell_ID2, N} @@ -34,9 +34,11 @@ external spots } \section{Functions}{ \itemize{ -\item \code{.cell_proximity_spots_internal()}: Compute cell-cell interactions observed value inner each spot +\item \code{.cell_proximity_spots_internal()}: Compute cell-cell interactions observed +value inner each spot -\item \code{.cell_proximity_spots_external()}: Compute cell-cell interactions observed value for interacted spots +\item \code{.cell_proximity_spots_external()}: Compute cell-cell interactions observed +value for interacted spots \item \code{.cell_proximity_spots()}: Wrapper function diff --git a/man/checkAndFixSpatialGenes.Rd b/man/checkAndFixSpatialGenes.Rd index cbd4ac96c..29ab806d0 100644 --- a/man/checkAndFixSpatialGenes.Rd +++ b/man/checkAndFixSpatialGenes.Rd @@ -24,10 +24,13 @@ checkAndFixSpatialGenes( \item{use_score}{logical variable to select silhouetteRank score} } \description{ -function to check the selected test name for spatial gene set in Giotto object +function to check the selected test name for spatial gene set +in Giotto object } \details{ -This function checks the user specified test name of spatial gene set in Giotto object. -SilhouetteRank works only with score, and SilhouetteRankTest works only with pval. Use parameter use_score to specify. +This function checks the user specified test name of spatial gene set in +Giotto object. +SilhouetteRank works only with score, and SilhouetteRankTest works only +with pval. Use parameter use_score to specify. } \keyword{internal} diff --git a/man/chooseAvailableSpatialGenes.Rd b/man/chooseAvailableSpatialGenes.Rd index a1507b04b..aae5381ff 100644 --- a/man/chooseAvailableSpatialGenes.Rd +++ b/man/chooseAvailableSpatialGenes.Rd @@ -14,11 +14,14 @@ chooseAvailableSpatialGenes(gobject, spat_unit = NULL, feat_type = NULL) \item{feat_type}{feature type} } \description{ -function to find the test name for existing spatial gene sets in Giotto +function to find the test name for existing spatial gene sets +in Giotto } \details{ -This function outputs the available test name for existing spatial gene sets in Giotto, +This function outputs the available test name for existing spatial gene sets +in Giotto, which could be used in parameter ‘name’ in `filterSpatialGenes`. -Priorities for showing the spatial gene test names are ‘binSpect’ > ‘silhouetteRankTest’ > ‘silhouetteRank’. +Priorities for showing the spatial gene test names are ‘binSpect’ > +‘silhouetteRankTest’ > ‘silhouetteRank’. } \keyword{internal} diff --git a/man/clusterCells.Rd b/man/clusterCells.Rd index 1282decb3..6d3d3f325 100644 --- a/man/clusterCells.Rd +++ b/man/clusterCells.Rd @@ -18,7 +18,7 @@ clusterCells( pyth_leid_iterations = 1000, pyth_louv_resolution = 1, pyth_louv_weight_col = NULL, - python_louv_random = F, + python_louv_random = FALSE, python_path = NULL, louvain_gamma = 1, louvain_omega = 1, @@ -138,7 +138,10 @@ cluster cells using a variety of different methods Wrapper for the different clustering methods. } \seealso{ -\code{\link{doLeidenCluster}}, \code{\link{.doLouvainCluster_community}}, \code{\link{.doLouvainCluster_multinet}}, -\code{\link{doLouvainCluster}}, \code{\link{doRandomWalkCluster}}, \code{\link{doSNNCluster}}, +\code{\link{doLeidenCluster}}, +\code{\link{.doLouvainCluster_community}}, +\code{\link{.doLouvainCluster_multinet}}, +\code{\link{doLouvainCluster}}, \code{\link{doRandomWalkCluster}}, +\code{\link{doSNNCluster}}, \code{\link{doKmeans}}, \code{\link{doHclust}} } diff --git a/man/combCCcom.Rd b/man/combCCcom.Rd index 8a7f7caeb..32b315810 100644 --- a/man/combCCcom.Rd +++ b/man/combCCcom.Rd @@ -30,11 +30,13 @@ combCCcom( \item{min_av_diff}{minimum average expression difference} -\item{detailed}{detailed option used with \code{\link{spatCellCellcom}} (default = FALSE)} +\item{detailed}{detailed option used with \code{\link{spatCellCellcom}} +(default = FALSE)} } \value{ combined data.table with spatial and expression communication data } \description{ -Combine spatial and expression based cell-cell communication data.tables +Combine spatial and expression based cell-cell communication +data.tables } diff --git a/man/combineCPG.Rd b/man/combineCPG.Rd index a77d2c886..2631b1008 100644 --- a/man/combineCPG.Rd +++ b/man/combineCPG.Rd @@ -13,8 +13,10 @@ combineCPG(...) \item{\code{icfObject}}{ICF (interaction changed feat) score object} \item{\code{selected_ints}}{subset of selected cell-cell interactions (optional)} \item{\code{selected_feats}}{subset of selected Feats (optional)} - \item{\code{specific_feats_1}}{specific Featset combo (need to position match specific_genes_2)} - \item{\code{specific_feats_2}}{specific Featset combo (need to position match specific_genes_1)} + \item{\code{specific_feats_1}}{specific Featset combo +(need to position match specific_genes_2)} + \item{\code{specific_feats_2}}{specific Featset combo +(need to position match specific_genes_1)} \item{\code{min_cells}}{minimum number of target cell type} \item{\code{min_int_cells}}{minimum number of interacting cell type} \item{\code{min_fdr}}{minimum adjusted p-value} diff --git a/man/combineCellProximityGenes.Rd b/man/combineCellProximityGenes.Rd index ccfd02a23..346b38834 100644 --- a/man/combineCellProximityGenes.Rd +++ b/man/combineCellProximityGenes.Rd @@ -13,8 +13,10 @@ combineCellProximityGenes(...) \item{\code{icfObject}}{ICF (interaction changed feat) score object} \item{\code{selected_ints}}{subset of selected cell-cell interactions (optional)} \item{\code{selected_feats}}{subset of selected Features (optional)} - \item{\code{specific_feats_1}}{specific Featureset combo (need to position match specific_feats_2)} - \item{\code{specific_feats_2}}{specific Featureset combo (need to position match specific_feats_1)} + \item{\code{specific_feats_1}}{specific Featureset combo +(need to position match specific_feats_2)} + \item{\code{specific_feats_2}}{specific Featureset combo +(need to position match specific_feats_1)} \item{\code{min_cells}}{minimum number of target cell type} \item{\code{min_int_cells}}{minimum number of interacting cell type} \item{\code{min_fdr}}{minimum adjusted p-value} diff --git a/man/combineICF.Rd b/man/combineICF.Rd index 3c3887a5b..f19eaccd6 100644 --- a/man/combineICF.Rd +++ b/man/combineICF.Rd @@ -16,7 +16,7 @@ combineICF( min_spat_diff = 0, min_log2_fc = 0.5, do_parallel = TRUE, - verbose = T + verbose = TRUE ) } \arguments{ @@ -26,9 +26,11 @@ combineICF( \item{selected_feats}{subset of selected Feats (optional)} -\item{specific_feats_1}{specific Featset combo (need to position match specific_genes_2)} +\item{specific_feats_1}{specific Featset combo +(need to position match specific_genes_2)} -\item{specific_feats_2}{specific Featset combo (need to position match specific_genes_1)} +\item{specific_feats_2}{specific Featset combo +(need to position match specific_genes_1)} \item{min_cells}{minimum number of target cell type} diff --git a/man/combineICG.Rd b/man/combineICG.Rd index 5cce4bed4..4947fbb72 100644 --- a/man/combineICG.Rd +++ b/man/combineICG.Rd @@ -13,8 +13,10 @@ combineICG(...) \item{\code{icfObject}}{ICF (interaction changed feat) score object} \item{\code{selected_ints}}{subset of selected cell-cell interactions (optional)} \item{\code{selected_feats}}{subset of selected Feats (optional)} - \item{\code{specific_feats_1}}{specific Featset combo (need to position match specific_genes_2)} - \item{\code{specific_feats_2}}{specific Featset combo (need to position match specific_genes_1)} + \item{\code{specific_feats_1}}{specific Featset combo +(need to position match specific_genes_2)} + \item{\code{specific_feats_2}}{specific Featset combo +(need to position match specific_genes_1)} \item{\code{min_cells}}{minimum number of target cell type} \item{\code{min_int_cells}}{minimum number of interacting cell type} \item{\code{min_fdr}}{minimum adjusted p-value} diff --git a/man/combineInteractionChangedFeats.Rd b/man/combineInteractionChangedFeats.Rd index 61a2f18ac..0614bcc17 100644 --- a/man/combineInteractionChangedFeats.Rd +++ b/man/combineInteractionChangedFeats.Rd @@ -16,7 +16,7 @@ combineInteractionChangedFeats( min_spat_diff = 0, min_log2_fc = 0.5, do_parallel = TRUE, - verbose = T + verbose = TRUE ) } \arguments{ @@ -26,9 +26,11 @@ combineInteractionChangedFeats( \item{selected_feats}{subset of selected Features (optional)} -\item{specific_feats_1}{specific Featureset combo (need to position match specific_feats_2)} +\item{specific_feats_1}{specific Featureset combo +(need to position match specific_feats_2)} -\item{specific_feats_2}{specific Featureset combo (need to position match specific_feats_1)} +\item{specific_feats_2}{specific Featureset combo +(need to position match specific_feats_1)} \item{min_cells}{minimum number of target cell type} diff --git a/man/combineInteractionChangedGenes.Rd b/man/combineInteractionChangedGenes.Rd index 19fbfa0d8..7fc57dde1 100644 --- a/man/combineInteractionChangedGenes.Rd +++ b/man/combineInteractionChangedGenes.Rd @@ -13,8 +13,10 @@ combineInteractionChangedGenes(...) \item{\code{icfObject}}{ICF (interaction changed feat) score object} \item{\code{selected_ints}}{subset of selected cell-cell interactions (optional)} \item{\code{selected_feats}}{subset of selected Features (optional)} - \item{\code{specific_feats_1}}{specific Featureset combo (need to position match specific_feats_2)} - \item{\code{specific_feats_2}}{specific Featureset combo (need to position match specific_feats_1)} + \item{\code{specific_feats_1}}{specific Featureset combo +(need to position match specific_feats_2)} + \item{\code{specific_feats_2}}{specific Featureset combo +(need to position match specific_feats_1)} \item{\code{min_cells}}{minimum number of target cell type} \item{\code{min_int_cells}}{minimum number of interacting cell type} \item{\code{min_fdr}}{minimum adjusted p-value} diff --git a/man/comparePolygonExpression.Rd b/man/comparePolygonExpression.Rd index 6ba828de0..86b690671 100644 --- a/man/comparePolygonExpression.Rd +++ b/man/comparePolygonExpression.Rd @@ -26,9 +26,11 @@ comparePolygonExpression( \item{selected_feats}{vector of selected features to plot} -\item{expression_values}{gene expression values to use ("normalized", "scaled", "custom")} +\item{expression_values}{gene expression values to use +("normalized", "scaled", "custom")} -\item{method}{method to use to detect differentially expressed feats ("scran", "gini", "mast")} +\item{method}{method to use to detect differentially expressed feats +("scran", "gini", "mast")} \item{\dots}{Arguments passed to \link[ComplexHeatmap]{Heatmap}} } diff --git a/man/convertEnsemblToGeneSymbol.Rd b/man/convertEnsemblToGeneSymbol.Rd index d5370b630..88d106181 100644 --- a/man/convertEnsemblToGeneSymbol.Rd +++ b/man/convertEnsemblToGeneSymbol.Rd @@ -15,7 +15,8 @@ convertEnsemblToGeneSymbol(matrix, species = c("mouse", "human")) expression matrix with gene symbols as rownames } \description{ -This function convert ensembl gene IDs from a matrix to official gene symbols +This function convert ensembl gene IDs from a matrix to +official gene symbols } \details{ This function requires that the biomaRt library is installed diff --git a/man/createArchRProj.Rd b/man/createArchRProj.Rd index 8e77db4c1..b43e48c6f 100644 --- a/man/createArchRProj.Rd +++ b/man/createArchRProj.Rd @@ -25,16 +25,20 @@ These files can be in one of the following formats: (i) scATAC tabix files, \item{genome}{A string indicating the default genome to be used for all ArchR functions. Currently supported values include "hg19","hg38","mm9", and "mm10". -This value is stored as a global environment variable, not part of the ArchRProject. +This value is stored as a global environment variable, not part of the +ArchRProject. This can be overwritten on a per-function basis using the given function's geneAnnotationand genomeAnnotation parameter. For something other than one of -the currently supported, see createGeneAnnnotation() and createGenomeAnnnotation()} +the currently supported, see createGeneAnnnotation() and +createGenomeAnnnotation()} -\item{createArrowFiles_params}{list of parameters passed to `ArchR::createArrowFiles`} +\item{createArrowFiles_params}{list of parameters passed to +`ArchR::createArrowFiles`} \item{ArchRProject_params}{list of parameters passed to `ArchR::ArchRProject`} -\item{addIterativeLSI_params}{list of parameters passed to `ArchR::addIterativeLSI`} +\item{addIterativeLSI_params}{list of parameters passed to +`ArchR::addIterativeLSI`} \item{threads}{number of threads to use. Default = `ArchR::getArchRThreads()`} @@ -43,7 +47,8 @@ the currently supported, see createGeneAnnnotation() and createGenomeAnnnotation \item{verbose}{Default = TRUE} } \value{ -An ArchR project with GeneScoreMatrix, TileMatrix, and TileMatrix-based LSI +An ArchR project with GeneScoreMatrix, TileMatrix, and +TileMatrix-based LSI } \description{ Create an ArchR project and run LSI dimension reduction diff --git a/man/createCrossSection.Rd b/man/createCrossSection.Rd index f4694b9b0..3d01596a5 100644 --- a/man/createCrossSection.Rd +++ b/man/createCrossSection.Rd @@ -33,38 +33,58 @@ createCrossSection( \item{name}{name of cress section object. (default = cross_sectino)} -\item{spatial_network_name}{name of spatial network object. (default = Delaunay_network)} +\item{spatial_network_name}{name of spatial network object. +(default = Delaunay_network)} -\item{thickness_unit}{unit of the virtual section thickness. If "cell", average size of the observed cells is used as length unit. If "natural", the unit of cell location coordinates is used.(default = cell)} +\item{thickness_unit}{unit of the virtual section thickness. If "cell", +average size of the observed cells is used as length unit. If "natural", +the unit of cell location coordinates is used.(default = cell)} \item{slice_thickness}{thickness of slice. default = 2} -\item{cell_distance_estimate_method}{method to estimate average distance between neighobring cells. (default = mean)} +\item{cell_distance_estimate_method}{method to estimate average distance +between neighobring cells. (default = mean)} -\item{extend_ratio}{deciding the span of the cross section meshgrid, as a ratio of extension compared to the borders of the vitural tissue section. (default = 0.2)} +\item{extend_ratio}{deciding the span of the cross section meshgrid, as a +ratio of extension compared to the borders of the vitural tissue section. +(default = 0.2)} \item{method}{method to define the cross section plane. -If equation, the plane is defined by a four element numerical vector (equation) in the form of c(A,B,C,D), corresponding to a plane with equation Ax+By+Cz=D. -If 3 points, the plane is define by the coordinates of 3 points, as given by point1, point2, and point3. -If point and norm vector, the plane is defined by the coordinates of one point (point1) in the plane and the coordinates of one norm vector (normVector) to the plane. -If point and two plane vector, the plane is defined by the coordinates of one point (point1) in the plane and the coordinates of two vectors (planeVector1, planeVector2) in the plane. +If equation, the plane is defined by a four element numerical vector +(equation) in the form of c(A,B,C,D), corresponding to a plane with +equation Ax+By+Cz=D. +If 3 points, the plane is define by the coordinates of 3 points, as given by +point1, point2, and point3. +If point and norm vector, the plane is defined by the coordinates of one +point (point1) in the plane and the coordinates of one norm vector +(normVector) to the plane. +If point and two plane vector, the plane is defined by the coordinates of +one point (point1) in the plane and the coordinates of two vectors +(planeVector1, planeVector2) in the plane. (default = equation)} -\item{equation}{equation required by method "equation".equations needs to be a numerical vector of length 4, in the form of c(A,B,C,D), which defines plane Ax+By+Cz=D.} +\item{equation}{equation required by method "equation".equations needs to be +a numerical vector of length 4, in the form of c(A,B,C,D), which defines +plane Ax+By+Cz=D.} -\item{point1}{coordinates of the first point required by method "3 points","point and norm vector", and "point and two plane vectors".} +\item{point1}{coordinates of the first point required by method +"3 points","point and norm vector", and "point and two plane vectors".} \item{point2}{coordinates of the second point required by method "3 points"} \item{point3}{coordinates of the third point required by method "3 points"} -\item{normVector}{coordinates of the norm vector required by method "point and norm vector"} +\item{normVector}{coordinates of the norm vector required by method +"point and norm vector"} -\item{planeVector1}{coordinates of the first plane vector required by method "point and two plane vectors"} +\item{planeVector1}{coordinates of the first plane vector required by +method "point and two plane vectors"} -\item{planeVector2}{coordinates of the second plane vector required by method "point and two plane vectors"} +\item{planeVector2}{coordinates of the second plane vector required by +method "point and two plane vectors"} -\item{mesh_grid_n}{numer of meshgrid lines to generate along both directions for the cross section plane.} +\item{mesh_grid_n}{numer of meshgrid lines to generate along both directions +for the cross section plane.} \item{return_gobject}{boolean: return giotto object (default = TRUE)} } @@ -75,5 +95,7 @@ giotto object with updated spatial network slot Create a virtual 2D cross section. } \details{ -Creates a virtual 2D cross section object for a given spatial network object. The users need to provide the definition of the cross section plane (see method). +Creates a virtual 2D cross section object for a given spatial +network object. The users need to provide the definition of the cross +section plane (see method). } diff --git a/man/createGiottoCosMxObject.Rd b/man/createGiottoCosMxObject.Rd index cbda75fe2..5343dde5a 100644 --- a/man/createGiottoCosMxObject.Rd +++ b/man/createGiottoCosMxObject.Rd @@ -20,10 +20,12 @@ createGiottoCosMxObject( \item{cosmx_dir}{full path to the exported cosmx directory} \item{data_to_use}{which type(s) of expression data to build the gobject with -Default is \code{'all'} information available. \code{'subcellular'} loads the transcript -coordinates only. \code{'aggregate'} loads the provided aggregated expression matrix.} +Default is \code{'all'} information available. \code{'subcellular'} loads +the transcript coordinates only. \code{'aggregate'} loads the provided +aggregated expression matrix.} -\item{remove_background_polygon}{try to remove background polygon (default: FALSE)} +\item{remove_background_polygon}{try to remove background polygon +(default: FALSE)} \item{background_algo}{algorithm to remove background polygon} @@ -47,9 +49,10 @@ Given the path to a CosMx experiment directory, creates a Giotto object. } \details{ -[\strong{Expected Directory}] This function generates a giotto object when given a -link to a cosmx output directory. It expects the following items within the directory -where the \strong{bolded} portions are what this function matches against: +[\strong{Expected Directory}] This function generates a giotto object when +given a link to a cosmx output directory. It expects the following items +within the directory where the \strong{bolded} portions are what this +function matches against: \itemize{ \item{\strong{CellComposite} (folder of images)} \item{\strong{CellLabels} (folder of images)} @@ -63,18 +66,24 @@ where the \strong{bolded} portions are what this function matches against: [\strong{Workflows}] Workflow to use is accessed through the data_to_use param \itemize{ - \item{'all' - loads and requires subcellular information from tx_file and fov_positions_file - and also the existing aggregated information (expression, spatial locations, and metadata) + \item{'all' - loads and requires subcellular information from tx_file and + fov_positions_file + and also the existing aggregated information + (expression, spatial locations, and metadata) from exprMat_file and metadata_file.} - \item{'subcellular' - loads and requires subcellular information from tx_file and + \item{'subcellular' - loads and requires subcellular information from + tx_file and fov_positions_file only.} - \item{'aggregate' - loads and requires the existing aggregate information (expression, - spatial locations, and metadata) from exprMat_file and metadata_file.} + \item{'aggregate' - loads and requires the existing aggregate information + (expression, spatial locations, and metadata) from exprMat_file and + metadata_file.} } -[\strong{Images}] Images in the default CellComposite, CellLabels, CompartmentLabels, and CellOverlay -folders will be loaded as giotto largeImage objects in all workflows as long as they are available. -Additionally, CellComposite images will be converted to giotto image objects, making plotting with +[\strong{Images}] Images in the default CellComposite, CellLabels, +CompartmentLabels, and CellOverlay +folders will be loaded as giotto largeImage objects in all workflows as +long as they are available. Additionally, CellComposite images will be +converted to giotto image objects, making plotting with these image objects more responsive when accessing them from a server. \code{\link{showGiottoImageNames}} can be used to see the available images. } diff --git a/man/createGiottoMerscopeObject.Rd b/man/createGiottoMerscopeObject.Rd index 902b18197..d93a7caa5 100644 --- a/man/createGiottoMerscopeObject.Rd +++ b/man/createGiottoMerscopeObject.Rd @@ -37,10 +37,11 @@ createGiottoMerscopeObject( \arguments{ \item{merscope_dir}{full path to the exported merscope directory} -\item{data_to_use}{which of either the 'subcellular' or 'aggregate' information -to use for object creation} +\item{data_to_use}{which of either the 'subcellular' or 'aggregate' +information to use for object creation} -\item{FOVs}{which FOVs to use when building the subcellular object. (default is NULL) +\item{FOVs}{which FOVs to use when building the subcellular object. +(default is NULL) NULL loads all FOVs (very slow)} \item{calculate_overlap}{whether to run \code{\link{calculateOverlapRaster}}} @@ -65,13 +66,14 @@ provided} a giotto object } \description{ -Given the path to a MERSCOPE experiment directory, creates a Giotto -object. +Given the path to a MERSCOPE experiment directory, creates a +Giotto object. } \details{ -[\strong{Expected Directory}] This function generates a giotto object when given a -link to a MERSCOPE output directory. It expects the following items within the directory -where the \strong{bolded} portions are what this function matches against: +[\strong{Expected Directory}] This function generates a giotto object when +given a link to a MERSCOPE output directory. It expects the following items +within the directory where the \strong{bolded} portions are what this +function matches against: \itemize{ \item{\strong{cell_boundaries} (folder .hdf5 files)} \item{\strong{images} (folder of .tif images and a scalefactor/transfrom table)} @@ -82,9 +84,11 @@ where the \strong{bolded} portions are what this function matches against: } \section{Functions}{ \itemize{ -\item \code{.createGiottoMerscopeObject_subcellular()}: Create giotto object with 'subcellular' workflow +\item \code{.createGiottoMerscopeObject_subcellular()}: Create giotto object with +'subcellular' workflow -\item \code{.createGiottoMerscopeObject_aggregate()}: Create giotto object with 'aggregate' workflow +\item \code{.createGiottoMerscopeObject_aggregate()}: Create giotto object with 'aggregate' +workflow }} \keyword{internal} diff --git a/man/createGiottoObjectfromArchR.Rd b/man/createGiottoObjectfromArchR.Rd index d62173f21..35c8db106 100644 --- a/man/createGiottoObjectfromArchR.Rd +++ b/man/createGiottoObjectfromArchR.Rd @@ -20,9 +20,11 @@ createGiottoObjectfromArchR( \item{expression_feat}{Giotto object available features (e.g. atac, rna, ...)} -\item{spatial_locs}{data.table or data.frame with coordinates for cell centroids} +\item{spatial_locs}{data.table or data.frame with coordinates for cell +centroids} -\item{sampleNames}{A character vector containing the ArchR project sample name} +\item{sampleNames}{A character vector containing the ArchR project sample +name} \item{...}{additional arguments passed to `createGiottoObject`} } diff --git a/man/createGiottoVisiumObject.Rd b/man/createGiottoVisiumObject.Rd index 791c37292..6c7c17fae 100644 --- a/man/createGiottoVisiumObject.Rd +++ b/man/createGiottoVisiumObject.Rd @@ -39,8 +39,8 @@ createGiottoVisiumObject( \item{h5_tissue_positions_path}{path to tissue locations (.csv file)} -\item{h5_image_png_path}{path to tissue .png file (optional). Image autoscaling -looks for matches in the filename for either 'hires' or 'lowres'} +\item{h5_image_png_path}{path to tissue .png file (optional). Image +autoscaling looks for matches in the filename for either 'hires' or 'lowres'} \item{h5_json_scalefactors_path}{path to .json scalefactors (optional)} @@ -56,13 +56,16 @@ looks for matches in the filename for either 'hires' or 'lowres'} \item{ymin_adj}{deprecated} -\item{instructions}{list of instructions or output result from \code{\link[GiottoClass]{createGiottoInstructions}}} +\item{instructions}{list of instructions or output result from +\code{\link[GiottoClass]{createGiottoInstructions}}} -\item{expression_matrix_class}{class of expression matrix to use (e.g. 'dgCMatrix', 'DelayedArray')} +\item{expression_matrix_class}{class of expression matrix to use +(e.g. 'dgCMatrix', 'DelayedArray')} \item{h5_file}{optional path to create an on-disk h5 file} -\item{cores}{how many cores or threads to use to read data if paths are provided} +\item{cores}{how many cores or threads to use to read data if paths are +provided} \item{verbose}{be verbose} } @@ -70,7 +73,8 @@ looks for matches in the filename for either 'hires' or 'lowres'} giotto object } \description{ -Create Giotto object directly from a 10X visium folder. Also accepts visium H5 outputs. +Create Giotto object directly from a 10X visium folder. Also +accepts visium H5 outputs. } \details{ If starting from a Visium 10X directory: diff --git a/man/createGiottoXeniumObject.Rd b/man/createGiottoXeniumObject.Rd index 7200c3346..2b8d1dac8 100644 --- a/man/createGiottoXeniumObject.Rd +++ b/man/createGiottoXeniumObject.Rd @@ -28,23 +28,25 @@ createGiottoXeniumObject( \item{load_format}{files formats from which to load the data. Either `csv` or `parquet` currently supported.} -\item{h5_expression}{(boolean) whether to load cell_feature_matrix from .h5 file. -Default is \code{TRUE}} +\item{h5_expression}{(boolean) whether to load cell_feature_matrix from .h5 +file. Default is \code{TRUE}} \item{h5_gene_ids}{use gene symbols (default) or ensembl ids for the .h5 gene expression matrix} -\item{gene_column_index}{which column from the features or genes .tsv file to use for row ids} +\item{gene_column_index}{which column from the features or genes .tsv file +to use for row ids} -\item{bounds_to_load}{vector of boundary information to load (e.g. \code{'cell'} +\item{bounds_to_load}{vector of boundary information to load +(e.g. \code{'cell'} or \code{'nucleus'} by themselves or \code{c('cell', 'nucleus')} to load both at the same time.)} -\item{qv_threshold}{Minimum Phred-scaled quality score cutoff to be included as -a subcellular transcript detection (default = 20)} +\item{qv_threshold}{Minimum Phred-scaled quality score cutoff to be included +as a subcellular transcript detection (default = 20)} -\item{key_list}{(advanced) list of grep-based keywords to split the subcellular -feature detections by feature type. See details} +\item{key_list}{(advanced) list of grep-based keywords to split the +subcellular feature detections by feature type. See details} \item{instructions}{list of instructions or output result from \code{\link[GiottoClass]{createGiottoInstructions}}} @@ -55,27 +57,29 @@ provided} \item{verbose}{be verbose when building Giotto object} } \description{ -Given the path to a Xenium experiment output folder, creates a Giotto -object +Given the path to a Xenium experiment output folder, creates a +Giotto object } \details{ [\strong{QC feature types}] Xenium provides info on feature detections that include more than only the Gene Expression specific probes. Additional probes for QC are included: \emph{blank codeword}, \emph{negative control codeword}, and -\emph{negative control probe}. These additional QC probes each occupy and are treated -as their own feature types so that they can largely remain independent of the -gene expression information. +\emph{negative control probe}. These additional QC probes each occupy and +are treated as their own feature types so that they can largely remain +independent of the gene expression information. [\strong{key_list}] Related to \code{data_to_use = 'subcellular'} workflow only: -Additional QC probe information is in the subcellular feature detections information -and must be separated from the gene expression information during processing. -The QC probes have prefixes that allow them to be selected from the rest of the -feature IDs. -Giotto uses a named list of keywords (\code{key_list}) to select these QC probes, -with the list names being the names that will be assigned as the feature type -of these feature detections. The default list is used when \code{key_list} = NULL. +Additional QC probe information is in the subcellular feature detections +information and must be separated from the gene expression information +during processing. +The QC probes have prefixes that allow them to be selected from the rest of +the feature IDs. +Giotto uses a named list of keywords (\code{key_list}) to select these QC +probes, with the list names being the names that will be assigned as the +feature type of these feature detections. The default list is used when +\code{key_list} = NULL. Default list: \preformatted{ diff --git a/man/createSpatialEnrich.Rd b/man/createSpatialEnrich.Rd index dc503b5a7..b05ba6227 100644 --- a/man/createSpatialEnrich.Rd +++ b/man/createSpatialEnrich.Rd @@ -17,14 +17,17 @@ createSpatialEnrich(...) \item{\code{sign_matrix}}{Matrix of signature genes for each cell type / process} \item{\code{expression_values}}{expression values to use} \item{\code{reverse_log_scale}}{reverse expression values from log scale} - \item{\code{min_overlap_genes}}{minimum number of overlapping genes in sign_matrix required to calculate enrichment (PAGE)} + \item{\code{min_overlap_genes}}{minimum number of overlapping genes in sign_matrix +required to calculate enrichment (PAGE)} \item{\code{logbase}}{log base to use if reverse_log_scale = TRUE} \item{\code{p_value}}{calculate p-value (default = FALSE)} - \item{\code{n_times}}{(page/rank) number of permutation iterations to calculate p-value} + \item{\code{n_times}}{(page/rank) number of permutation iterations to calculate +p-value} \item{\code{rbp_p}}{(rank) fractional binarization threshold (default = 0.99)} \item{\code{num_agg}}{(rank) number of top genes to aggregate (default = 100)} \item{\code{max_block}}{number of lines to process together (default = 20e6)} - \item{\code{top_percentage}}{(hyper) percentage of cells that will be considered to have gene expression with matrix binarization} + \item{\code{top_percentage}}{(hyper) percentage of cells that will be considered +to have gene expression with matrix binarization} \item{\code{output_enrichment}}{how to return enrichment output} \item{\code{name}}{to give to spatial enrichment results, default = PAGE} \item{\code{verbose}}{be verbose} @@ -32,7 +35,8 @@ createSpatialEnrich(...) }} } \description{ -Function to calculate gene signature enrichment scores per spatial position using an enrichment test. +Function to calculate gene signature enrichment scores per +spatial position using an enrichment test. } \seealso{ \code{\link{runSpatialEnrich}} diff --git a/man/createSpatialGenomicsObject.Rd b/man/createSpatialGenomicsObject.Rd index 1dfc33075..7d05c80e7 100644 --- a/man/createSpatialGenomicsObject.Rd +++ b/man/createSpatialGenomicsObject.Rd @@ -9,7 +9,8 @@ createSpatialGenomicsObject(sg_dir = NULL, instructions = NULL) \arguments{ \item{sg_dir}{full path to the exported Spatial Genomics directory} -\item{instructions}{new instructions (e.g. result from createGiottoInstructions)} +\item{instructions}{new instructions +(e.g. result from createGiottoInstructions)} } \description{ Given the path to a Spatial Genomics data directory, creates a diff --git a/man/create_crossSection_object.Rd b/man/create_crossSection_object.Rd index b09316bf6..9952d8760 100644 --- a/man/create_crossSection_object.Rd +++ b/man/create_crossSection_object.Rd @@ -26,29 +26,39 @@ create_crossSection_object( \item{method}{method to define the cross section plane.} -\item{thickness_unit}{unit of the virtual section thickness. If "cell", average size of the observed cells is used as length unit. If "natural", the unit of cell location coordinates is used.(default = cell)} +\item{thickness_unit}{unit of the virtual section thickness. If "cell", +average size of the observed cells is used as length unit. If "natural", +the unit of cell location coordinates is used.(default = cell)} \item{slice_thickness}{thickness of slice} -\item{cell_distance_estimate_method}{method to estimate average distance between neighobring cells. (default = mean)} +\item{cell_distance_estimate_method}{method to estimate average distance +between neighboring cells. (default = mean)} -\item{extend_ratio}{deciding the span of the cross section meshgrid, as a ratio of extension compared to the borders of the virtual tissue section. (default = 0.2)} +\item{extend_ratio}{deciding the span of the cross section meshgrid, as a +ratio of extension compared to the borders of the virtual tissue section. +(default = 0.2)} -\item{plane_equation}{a numerical vector of length 4, in the form of c(A,B,C,D), which defines plane Ax+By+Cz=D.} +\item{plane_equation}{a numerical vector of length 4, in the form of +c(A,B,C,D), which defines plane Ax+By+Cz=D.} -\item{mesh_grid_n}{number of meshgrid lines to generate along both directions for the cross section plane.} +\item{mesh_grid_n}{number of meshgrid lines to generate along both +directions for the cross section plane.} \item{mesh_obj}{object that stores the cross section meshgrid information.} \item{cell_subset}{cells selected by the cross section} -\item{cell_subset_spatial_locations}{locations of cells selected by the cross section} +\item{cell_subset_spatial_locations}{locations of cells selected by the +cross section} -\item{cell_subset_projection_locations}{3D projection coordinates of selected cells onto the cross section plane} +\item{cell_subset_projection_locations}{3D projection coordinates of +selected cells onto the cross section plane} \item{cell_subset_projection_PCA}{pca of projection coordinates} -\item{cell_subset_projection_coords}{2D PCA coordinates of selected cells in the cross section plane} +\item{cell_subset_projection_coords}{2D PCA coordinates of selected cells +in the cross section plane} } \description{ create a crossSection object diff --git a/man/create_screeplot.Rd b/man/create_screeplot.Rd index 72fcac675..0f8925edf 100644 --- a/man/create_screeplot.Rd +++ b/man/create_screeplot.Rd @@ -21,8 +21,8 @@ create screeplot with ggplot } \examples{ \dontrun{ -dr = GiottoData::loadSubObjectMini('dimObj') -scree = create_screeplot(methods::slot(dr, 'misc')$eigenvalues) +dr <- GiottoData::loadSubObjectMini("dimObj") +scree <- create_screeplot(methods::slot(dr, "misc")$eigenvalues) scree } } diff --git a/man/crossSectionGenePlot.Rd b/man/crossSectionGenePlot.Rd index 6dc072505..1352bab53 100644 --- a/man/crossSectionGenePlot.Rd +++ b/man/crossSectionGenePlot.Rd @@ -25,7 +25,8 @@ crossSectionGenePlot( \item{spatial_network_name}{name of spatial network to use} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} \item{...}{parameters for spatFeatPlot2D} } @@ -33,7 +34,8 @@ crossSectionGenePlot( ggplot } \description{ -Visualize cells and gene expression in a virtual cross section according to spatial coordinates +Visualize cells and gene expression in a virtual cross section +according to spatial coordinates } \details{ Description of parameters. diff --git a/man/crossSectionGenePlot3D.Rd b/man/crossSectionGenePlot3D.Rd index 44873a59e..6c613f91e 100644 --- a/man/crossSectionGenePlot3D.Rd +++ b/man/crossSectionGenePlot3D.Rd @@ -23,9 +23,11 @@ crossSectionGenePlot3D( \item{spatial_network_name}{name of spatial network to use} -\item{other_cell_color}{color of cells outside the cross section. default = transparent.} +\item{other_cell_color}{color of cells outside the cross section. +default = transparent.} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, change +save_name in save_param} \item{...}{parameters for spatGenePlot3D} } @@ -33,7 +35,8 @@ crossSectionGenePlot3D( ggplot } \description{ -Visualize cells and gene expression in a virtual cross section according to spatial coordinates +Visualize cells and gene expression in a virtual cross section +according to spatial coordinates } \details{ Description of parameters. diff --git a/man/crossSectionPlot.Rd b/man/crossSectionPlot.Rd index 95ceea4e8..ac4062e83 100644 --- a/man/crossSectionPlot.Rd +++ b/man/crossSectionPlot.Rd @@ -22,13 +22,15 @@ crossSectionPlot( \item{feat_type}{feature type} -\item{crossSection_obj}{cross section object as alternative input. default = NULL.} +\item{crossSection_obj}{cross section object as alternative input. +default = NULL.} \item{name}{name of virtual cross section to use} \item{spatial_network_name}{name of spatial network to use} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} \item{...}{parameters for spatPlot2D} } @@ -36,7 +38,8 @@ crossSectionPlot( ggplot } \description{ -Visualize cells in a virtual cross section according to spatial coordinates +Visualize cells in a virtual cross section according to +spatial coordinates } \details{ Description of parameters. diff --git a/man/crossSectionPlot3D.Rd b/man/crossSectionPlot3D.Rd index b7899465f..efb03fad5 100644 --- a/man/crossSectionPlot3D.Rd +++ b/man/crossSectionPlot3D.Rd @@ -9,7 +9,7 @@ crossSectionPlot3D( crossSection_obj = NULL, name = NULL, spatial_network_name = "Delaunay_network", - show_other_cells = T, + show_other_cells = TRUE, other_cell_color = alpha("lightgrey", 0), default_save_name = "crossSection3D", ... @@ -18,7 +18,8 @@ crossSectionPlot3D( \arguments{ \item{gobject}{giotto object} -\item{crossSection_obj}{cross section object as alternative input. default = NULL.} +\item{crossSection_obj}{cross section object as alternative input. +default = NULL.} \item{name}{name of virtual cross section to use} @@ -26,9 +27,11 @@ crossSectionPlot3D( \item{show_other_cells}{display not selected cells} -\item{other_cell_color}{color of cells outside the cross section. default = transparent.} +\item{other_cell_color}{color of cells outside the cross section. +default = transparent.} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} \item{...}{parameters for spatPlot3D} } @@ -36,7 +39,8 @@ crossSectionPlot3D( ggplot } \description{ -Visualize cells in a virtual cross section according to spatial coordinates +Visualize cells in a virtual cross section according to spatial +coordinates } \details{ Description of parameters. diff --git a/man/detectSpatialCorFeats.Rd b/man/detectSpatialCorFeats.Rd index dc7d56181..8a46d3048 100644 --- a/man/detectSpatialCorFeats.Rd +++ b/man/detectSpatialCorFeats.Rd @@ -36,7 +36,8 @@ detectSpatialCorFeats( \item{spatial_network_name}{name of spatial network to use} -\item{network_smoothing}{smoothing factor beteen 0 and 1 (default: automatic)} +\item{network_smoothing}{smoothing factor beteen 0 and 1 +(default: automatic)} \item{spatial_grid_name}{name of spatial grid to use} @@ -51,8 +52,10 @@ returns a spatial correlation object: "spatCorObject" Detect features that are spatially correlated } \details{ -For method = network, it expects a fully connected spatial network. You can make sure to create a -fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +For method = network, it expects a fully connected spatial network. You +can make sure to create a +fully connected network by setting minimal_k > 0 in the + \code{\link{createSpatialNetwork}} function. \itemize{ \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell diff --git a/man/detectSpatialCorFeatsMatrix.Rd b/man/detectSpatialCorFeatsMatrix.Rd index 11243ecb3..bdb575ef4 100644 --- a/man/detectSpatialCorFeatsMatrix.Rd +++ b/man/detectSpatialCorFeatsMatrix.Rd @@ -29,7 +29,8 @@ detectSpatialCorFeatsMatrix( \item{subset_feats}{subset of features to use} -\item{network_smoothing}{smoothing factor beteen 0 and 1 (default: automatic)} +\item{network_smoothing}{smoothing factor beteen 0 and 1 +(default: automatic)} \item{min_cells_per_grid}{minimum number of cells to consider a grid} @@ -42,8 +43,10 @@ returns a spatial correlation object: "spatCorObject" Detect genes that are spatially correlated } \details{ -For method = network, it expects a fully connected spatial network. You can make sure to create a -fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +For method = network, it expects a fully connected spatial network. +You can make sure to create a +fully connected network by setting minimal_k > 0 in the +\code{\link{createSpatialNetwork}} function. \itemize{ \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell diff --git a/man/detectSpatialCorGenes.Rd b/man/detectSpatialCorGenes.Rd index 2e867bf1f..949b90ab2 100644 --- a/man/detectSpatialCorGenes.Rd +++ b/man/detectSpatialCorGenes.Rd @@ -36,7 +36,8 @@ detectSpatialCorGenes( \item{spatial_network_name}{name of spatial network to use} -\item{network_smoothing}{smoothing factor beteen 0 and 1 (default: automatic)} +\item{network_smoothing}{smoothing factor beteen 0 and 1 +(default: automatic)} \item{spatial_grid_name}{name of spatial grid to use} @@ -51,14 +52,19 @@ returns a spatial correlation object: "spatCorObject" Detect genes that are spatially correlated } \details{ -For method = network, it expects a fully connected spatial network. You can make sure to create a -fully connected network by setting minimal_k > 0 in the \code{\link{createSpatialNetwork}} function. +For method = network, it expects a fully connected spatial network. You +can make sure to create a +fully connected network by setting minimal_k > 0 in the +\code{\link{createSpatialNetwork}} function. \itemize{ - \item{1. grid-averaging: }{average gene expression values within a predefined spatial grid} - \item{2. network-averaging: }{smoothens the gene expression matrix by averaging the expression within one cell - by using the neighbours within the predefined spatial network. b is a smoothening factor - that defaults to 1 - 1/k, where k is the median number of k-neighbors in the - selected spatial network. Setting b = 0 means no smoothing and b = 1 means no contribution + \item{1. grid-averaging: }{average gene expression values within a + predefined spatial grid} + \item{2. network-averaging: }{smoothens the gene expression matrix by + averaging the expression within one cell + by using the neighbours within the predefined spatial network. b is a + smoothening factor that defaults to 1 - 1/k, where k is the median + number of k-neighbors in the selected spatial network. Setting b = 0 + means no smoothing and b = 1 means no contribution from its own expression.} } The spatCorObject can be further explored with showSpatialCorGenes() diff --git a/man/detectSpatialPatterns.Rd b/man/detectSpatialPatterns.Rd index 85a5110aa..9526b2843 100644 --- a/man/detectSpatialPatterns.Rd +++ b/man/detectSpatialPatterns.Rd @@ -9,9 +9,9 @@ detectSpatialPatterns( expression_values = c("normalized", "scaled", "custom"), spatial_grid_name = "spatial_grid", min_cells_per_grid = 4, - scale_unit = F, + scale_unit = FALSE, ncp = 100, - show_plot = T, + show_plot = TRUE, PC_zscore = 1.5 ) } @@ -20,7 +20,8 @@ detectSpatialPatterns( \item{expression_values}{expression values to use} -\item{spatial_grid_name}{name of spatial grid to use (default = 'spatial_grid')} +\item{spatial_grid_name}{name of spatial grid to use +(default = 'spatial_grid')} \item{min_cells_per_grid}{minimum number of cells in a grid to be considered} @@ -36,7 +37,8 @@ detectSpatialPatterns( spatial pattern object 'spatPatObj' } \description{ -Identify spatial patterns through PCA on average expression in a spatial grid. +Identify spatial patterns through PCA on average expression +in a spatial grid. } \details{ Steps to identify spatial patterns: diff --git a/man/doClusterProjection.Rd b/man/doClusterProjection.Rd index 7697c7afd..3c64954b7 100644 --- a/man/doClusterProjection.Rd +++ b/man/doClusterProjection.Rd @@ -40,7 +40,8 @@ doClusterProjection( \item{reduction_name}{name of shared reduction space (default name = 'pca')} -\item{dimensions_to_use}{dimensions to use in shared reduction space (default = 1:10)} +\item{dimensions_to_use}{dimensions to use in shared reduction space +(default = 1:10)} \item{knn_k}{number of k-neighbors to train a KNN classifier} @@ -54,21 +55,23 @@ doClusterProjection( giotto object (default) or data.table with cell metadata } \description{ -Use a fast KNN classifier to predict labels from a smaller giotto object +Use a fast KNN classifier to predict labels from a smaller +giotto object } \details{ -Function to train a KNN with \code{\link[FNN]{knn}}. The training data -is obtained from the source giotto object (source_gobject) using existing annotations -within the cell metadata. Cells without annotation/labels from the target giotto -object (target_gobject) will receive predicted labels (and optional probabilities -with prob = TRUE). +Function to train a KNN with \code{\link[FNN]{knn}}. The training +data is obtained from the source giotto object (source_gobject) using +existing annotations within the cell metadata. Cells without +annotation/labels from the target giotto object (target_gobject) will +receive predicted labels (and optional probabilities with prob = TRUE). -**IMPORTANT** This projection assumes that you're using the same dimension reduction -space (e.g. PCA) and number of dimensions (e.g. first 10 PCs) to train the KNN -classifier as you used to create the initial annotations/labels in the source -Giotto object. +**IMPORTANT** This projection assumes that you're using the same dimension +reduction space (e.g. PCA) and number of dimensions (e.g. first 10 PCs) to +train the KNN classifier as you used to create the initial +annotations/labels in the source Giotto object. -Altogether this is a convenience function that allow you to work with very big -data as you can predict cell labels on a smaller & subsetted Giotto object and then -project the cell labels to the remaining cells in the target Giotto object. +Altogether this is a convenience function that allow you to work with very +big data as you can predict cell labels on a smaller & subsetted Giotto +object and then project the cell labels to the remaining cells in the target +Giotto object. } diff --git a/man/doFeatureSetEnrichment.Rd b/man/doFeatureSetEnrichment.Rd index fa456970a..6b7df188f 100644 --- a/man/doFeatureSetEnrichment.Rd +++ b/man/doFeatureSetEnrichment.Rd @@ -25,32 +25,43 @@ doFeatureSetEnrichment( \arguments{ \item{dryrun}{do a dry run, default TRUE.} -\item{path_to_GSEA}{path to GSEA command line executable, e.g. gsea-XXX.jar. See details (1.) for more information.} +\item{path_to_GSEA}{path to GSEA command line executable, e.g. gsea-XXX.jar. +See details (1.) for more information.} -\item{GSEA_dataset}{path to a Human/Mouse collection from GSEA, e.g. Hallmarks C1. See details (2.) for more information.} +\item{GSEA_dataset}{path to a Human/Mouse collection from GSEA, e.g. +Hallmarks C1. See details (2.) for more information.} -\item{GSEA_ranked_file}{path to .rnk file for GSEA. See details (3.) for more information} +\item{GSEA_ranked_file}{path to .rnk file for GSEA. See details (3.) for +more information} -\item{output_folder}{path to which the GSEA results will be saved. Default is current working directory.} +\item{output_folder}{path to which the GSEA results will be saved. Default +is current working directory.} -\item{name_analysis_folder}{default output subdirectory prefix to which results are saved. -Will live within output_folder; equivalent of "Analysis Name" in GSEA Application.} +\item{name_analysis_folder}{default output subdirectory prefix to which +results are saved. + Will live within output_folder; equivalent of + "Analysis Name" in GSEA Application.} -\item{collapse}{only 'false' is supported. This will use your dataset as-is, in the original format.} +\item{collapse}{only 'false' is supported. This will use your dataset as-is, +in the original format.} -\item{mode}{option selected in Advanced Field "Collapsing Mode for Probe Sets => 1 gene"} +\item{mode}{option selected in Advanced Field "Collapsing Mode for +Probe Sets => 1 gene"} \item{norm}{normalization mode; only meandiv is supported.} \item{nperm}{number of permutations, default 1000} -\item{scoring_scheme}{Default "weighted", equivalent of "enrichment statistic" in GSEA Application} +\item{scoring_scheme}{Default "weighted", equivalent of +"enrichment statistic" in GSEA Application} \item{plot_top_x}{Default 20, number of enrichment plots to produce.} -\item{set_max}{default 500, equivalent to "max size; exclude larger sets" in Basic Fields in GSEA Application} +\item{set_max}{default 500, equivalent to "max size; exclude larger sets" +in Basic Fields in GSEA Application} -\item{set_min}{default 15, equivalent to "min size; exclude smaller sets" in Basic Fields in GSEA Application} +\item{set_min}{default 15, equivalent to "min size; exclude smaller sets" +in Basic Fields in GSEA Application} } \description{ Preform Gene Set Enrichment Analysis using marker genes @@ -60,10 +71,12 @@ NECESSARY PREREQUISITES 1. download and install the COMMAND line (all platforms) gsea-XXX.jar https://www.gsea-msigdb.org/gsea/downloads.jsp 1.1. download zip file -1.2. unzip and move to known location (e.g. in path/to/your/applications/gsea/GSEA_4.3.2) +1.2. unzip and move to known location +(e.g. in path/to/your/applications/gsea/GSEA_4.3.2) 2. download the Human and Mouse collections -https://www.gsea-msigdb.org/gsea/msigdb/index.jsp or zipped folder https://www.gsea-msigdb.org/gsea/downloads.jsp (all downloaded) +https://www.gsea-msigdb.org/gsea/msigdb/index.jsp or zipped folder +https://www.gsea-msigdb.org/gsea/downloads.jsp (all downloaded) 3. create ranked gene lists format: data.table or data.frame with 2 columns diff --git a/man/doGiottoClustree.Rd b/man/doGiottoClustree.Rd index 56743e14b..ea1ead244 100644 --- a/man/doGiottoClustree.Rd +++ b/man/doGiottoClustree.Rd @@ -23,7 +23,8 @@ doGiottoClustree( \item{res_vector}{vector of different resolutions to test} -\item{res_seq}{list of float numbers indicating start, end, and step size for resolution testing, i.e. (0.1, 0.6, 0.1)} +\item{res_seq}{list of float numbers indicating start, end, and step size +for resolution testing, i.e. (0.1, 0.6, 0.1)} \item{return_gobject}{default FALSE. See details for more info.} @@ -33,7 +34,8 @@ doGiottoClustree( \item{return_plot}{by default, pulls from provided gobject instructions} -\item{save_param}{list of saving parameters from \code{\link[GiottoVisuals:plot_save]{GiottoVisuals::all_plots_save_function()}}} +\item{save_param}{list of saving parameters from +\code{\link[GiottoVisuals:plot_save]{GiottoVisuals::all_plots_save_function()}}} \item{default_save_name}{name of saved plot, default "clustree"} @@ -49,15 +51,18 @@ doGiottoClustree( a plot object (default), OR a giotto object (if specified) } \description{ -cluster cells using leiden methodology to visualize different resolutions +cluster cells using leiden methodology to visualize different +resolutions } \details{ -This function tests different resolutions for Leiden clustering and provides a visualization -of cluster sizing as resolution varies. +This function tests different resolutions for Leiden clustering and +provides a visualization of cluster sizing as resolution varies. -By default, the tested leiden clusters are NOT saved to the Giotto object, and a plot is returned. +By default, the tested leiden clusters are NOT saved to the Giotto object, +and a plot is returned. -If return_gobject is set to TRUE, and a giotto object with \emph{all} tested leiden cluster information +If return_gobject is set to TRUE, and a giotto object with \emph{all} tested +leiden cluster information will be returned. } \seealso{ diff --git a/man/doHMRF.Rd b/man/doHMRF.Rd index 30210edcf..b20068709 100644 --- a/man/doHMRF.Rd +++ b/man/doHMRF.Rd @@ -43,7 +43,8 @@ doHMRF( \item{spatial_genes}{spatial genes to use for HMRF} -\item{spatial_dimensions}{select spatial dimensions to use, default is all possible dimensions} +\item{spatial_dimensions}{select spatial dimensions to use, default is all +possible dimensions} \item{dim_reduction_to_use}{use another dimension reduction set as input} @@ -51,13 +52,15 @@ doHMRF( \item{dimensions_to_use}{number of dimensions to use as input} -\item{seed}{seed to fix random number generator (for creating initialization of HMRF) (-1 if no fixing)} +\item{seed}{seed to fix random number generator +(for creating initialization of HMRF) (-1 if no fixing)} \item{name}{name of HMRF run} \item{k}{number of HMRF domains} -\item{betas}{betas to test for. three numbers: start_beta, beta_increment, num_betas e.g. c(0, 2.0, 50)} +\item{betas}{betas to test for. three numbers: start_beta, beta_increment, +num_betas e.g. c(0, 2.0, 50)} \item{tolerance}{tolerance} @@ -72,7 +75,8 @@ doHMRF( \item{overwrite_output}{overwrite output folder} } \value{ -Creates a directory with results that can be viewed with viewHMRFresults +Creates a directory with results that can be viewed with +viewHMRFresults } \description{ Run HMRF diff --git a/man/doHMRF_V2.Rd b/man/doHMRF_V2.Rd index fd9124237..aadaad79c 100644 --- a/man/doHMRF_V2.Rd +++ b/man/doHMRF_V2.Rd @@ -7,21 +7,32 @@ doHMRF_V2(HMRF_init_obj, betas = NULL) } \arguments{ -\item{HMRF_init_obj}{initialization object list returned from initHMRF() function} +\item{HMRF_init_obj}{initialization object list returned from initHMRF() +function} -\item{betas}{beta value of the HMRF model, controlling the smoothness of clustering. NULL value of beta will provide default values based on feature numbers, otherwise, a vector of three values: initial beta, beta increment, and number of betas} +\item{betas}{beta value of the HMRF model, controlling the smoothness of +clustering. NULL value of beta will provide default values based on feature +numbers, otherwise, a vector of three values: initial beta, beta increment, +and number of betas} } \description{ function to run HMRF model } \details{ -This function will run a HMRF model after initialization of HMRF. Of note is the beta parameter, the smoothing parameter. -If the users are interested in selecting results from different smoothness, we recommend running a range of betas, -hence betas specify what this range is. For example, betas=c(0,10,5) will run for the following betas: 0, 10, 20, 30, 40. -betas=c(0,5,2) will run for betas: 0, 5, 10. Setting the beta can use the following guideline: +This function will run a HMRF model after initialization of HMRF. Of note +is the beta parameter, the smoothing parameter. +If the users are interested in selecting results from different smoothness, +we recommend running a range of betas, +hence betas specify what this range is. For example, betas=c(0,10,5) will +run for the following betas: 0, 10, 20, 30, 40. +betas=c(0,5,2) will run for betas: 0, 5, 10. Setting the beta can use the +following guideline: If number of features N is 10: can be used to select a column name of highly variable features, created by (see \code{\link{calculateHVF}}) - \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features + \item feats_to_use = c('geneA', 'geneB', ...): will use all manually + provided features } By default the number of principle components that we calculate is 100, which may not encompass all the variation within the dataset. Setting ncp to NULL diff --git a/man/runPCAprojection.Rd b/man/runPCAprojection.Rd index ae2546d97..26a3c1488 100644 --- a/man/runPCAprojection.Rd +++ b/man/runPCAprojection.Rd @@ -69,10 +69,12 @@ runPCAprojection( giotto object with updated PCA dimension recuction } \description{ -runs a Principal Component Analysis on a random subet + projection +runs a Principal Component Analysis on a random +subset + projection } \details{ -See \code{\link[BiocSingular]{runPCA}} and \code{\link[FactoMineR]{PCA}} for more information about other parameters. +See \code{\link[BiocSingular]{runPCA}} and +\code{\link[FactoMineR]{PCA}} for more information about other parameters. This PCA implementation is similar to \code{\link{runPCA}}, except that it performs PCA on a subset of the cells or features, and predict on the others. This can significantly increase speed without sacrificing accuracy too much. @@ -80,6 +82,7 @@ This can significantly increase speed without sacrificing accuracy too much. \item feats_to_use = NULL: will use all features from the selected matrix \item feats_to_use = : can be used to select a column name of highly variable features, created by (see \code{\link{calculateHVF}}) - \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features + \item feats_to_use = c('geneA', 'geneB', ...): will use all manually + provided features } } diff --git a/man/runPCAprojectionBatch.Rd b/man/runPCAprojectionBatch.Rd index f193e08ca..e3b69e049 100644 --- a/man/runPCAprojectionBatch.Rd +++ b/man/runPCAprojectionBatch.Rd @@ -69,20 +69,26 @@ runPCAprojectionBatch( \item{...}{additional parameters for PCA (see details)} } \value{ -giotto object with updated PCA dimension recuction +giotto object with updated PCA dimension reduction } \description{ -runs a Principal Component Analysis on multiple random batches + projection +runs a Principal Component Analysis on multiple random +batches + projection } \details{ -See \code{\link[BiocSingular]{runPCA}} and \code{\link[FactoMineR]{PCA}} for more information about other parameters. -This PCA implementation is similar to \code{\link{runPCA}} and \code{\link{runPCAprojection}}, -except that it performs PCA on multiple subsets (batches) of the cells or features, -and predict on the others. This can significantly increase speed without sacrificing accuracy too much. +See \code{\link[BiocSingular]{runPCA}} and +\code{\link[FactoMineR]{PCA}} for more information about other parameters. +This PCA implementation is similar to \code{\link{runPCA}} and +\code{\link{runPCAprojection}}, +except that it performs PCA on multiple subsets (batches) of the cells or +features, +and predict on the others. This can significantly increase speed without +sacrificing accuracy too much. \itemize{ \item feats_to_use = NULL: will use all features from the selected matrix \item feats_to_use = : can be used to select a column name of highly variable features, created by (see \code{\link{calculateHVF}}) - \item feats_to_use = c('geneA', 'geneB', ...): will use all manually provided features + \item feats_to_use = c('geneA', 'geneB', ...): will use all manually + provided features } } diff --git a/man/runPatternSimulation.Rd b/man/runPatternSimulation.Rd index a146e2b81..b7f117094 100644 --- a/man/runPatternSimulation.Rd +++ b/man/runPatternSimulation.Rd @@ -19,9 +19,9 @@ runPatternSimulation( spat_methods_names = c("binSpect_single", "binSpect_multi", "spatialDE", "spark", "silhouetteRank"), scalefactor = 6000, - save_plot = T, - save_raw = T, - save_norm = T, + save_plot = TRUE, + save_raw = TRUE, + save_norm = TRUE, save_dir = "~", max_col = 4, height = 7, @@ -41,7 +41,8 @@ runPatternSimulation( \item{gene_names}{selected genes} -\item{spatial_probs}{probabilities to test for a high expressing gene value to be part of the spatial pattern} +\item{spatial_probs}{probabilities to test for a high expressing gene +value to be part of the spatial pattern} \item{reps}{number of random simulation repetitions} @@ -49,9 +50,11 @@ runPatternSimulation( \item{spat_methods}{vector of spatial methods to test} -\item{spat_methods_params}{list of parameters list for each element in the vector of spatial methods to test} +\item{spat_methods_params}{list of parameters list for each element in the +vector of spatial methods to test} -\item{spat_methods_names}{name for each element in the vector of spatial elements to test} +\item{spat_methods_names}{name for each element in the vector of spatial +elements to test} \item{scalefactor}{library size scaling factor when re-normalizing dataset} @@ -77,5 +80,6 @@ runPatternSimulation( data.table with results } \description{ -Creates a known spatial pattern for selected genes one-by-one and runs the different spatial gene detection tests +Creates a known spatial pattern for selected genes one-by-one +and runs the different spatial gene detection tests } diff --git a/man/runRankEnrich.Rd b/man/runRankEnrich.Rd index a3eee606e..44812297f 100644 --- a/man/runRankEnrich.Rd +++ b/man/runRankEnrich.Rd @@ -57,16 +57,20 @@ runRankEnrich( data.table with enrichment results } \description{ -Function to calculate gene signature enrichment scores per spatial position using a rank based approach. +Function to calculate gene signature enrichment scores per +spatial position using a rank based approach. } \details{ -sign_matrix: a rank-fold matrix with genes as row names and cell-types as column names. -Alternatively a scRNA-seq matrix and vector with clusters can be provided to makeSignMatrixRank, which will create -the matrix for you. \cr +sign_matrix: a rank-fold matrix with genes as row names and cell-types as +column names. +Alternatively a scRNA-seq matrix and vector with clusters can be provided +to makeSignMatrixRank, which will create the matrix for you. \cr First a new rank is calculated as R = (R1*R2)^(1/2), where R1 is the rank of -fold-change for each gene in each spot and R2 is the rank of each marker in each cell type. -The Rank-Biased Precision is then calculated as: RBP = (1 - 0.99) * (0.99)^(R - 1) +fold-change for each gene in each spot and R2 is the rank of each marker in +each cell type. +The Rank-Biased Precision is then calculated as: +RBP = (1 - 0.99) * (0.99)^(R - 1) and the final enrichment score is then calculated as the sum of top 100 RBPs. } \seealso{ diff --git a/man/runSpatialDeconv.Rd b/man/runSpatialDeconv.Rd index 25db08cf7..6fcf312b9 100644 --- a/man/runSpatialDeconv.Rd +++ b/man/runSpatialDeconv.Rd @@ -48,7 +48,8 @@ runSpatialDeconv( giotto object or deconvolution results } \description{ -Function to perform deconvolution based on single cell expression data +Function to perform deconvolution based on single cell +expression data } \seealso{ \code{\link{runDWLSDeconv}} diff --git a/man/runSpatialEnrich.Rd b/man/runSpatialEnrich.Rd index b5d1e0dbb..815bddaa2 100644 --- a/man/runSpatialEnrich.Rd +++ b/man/runSpatialEnrich.Rd @@ -39,7 +39,8 @@ runSpatialEnrich( \item{expression_values}{expression values to use} -\item{min_overlap_genes}{minimum number of overlapping genes in sign_matrix required to calculate enrichment (PAGE)} +\item{min_overlap_genes}{minimum number of overlapping genes in sign_matrix +required to calculate enrichment (PAGE)} \item{reverse_log_scale}{reverse expression values from log scale} @@ -47,7 +48,8 @@ runSpatialEnrich( \item{p_value}{calculate p-value (default = FALSE)} -\item{n_times}{(page/rank) number of permutation iterations to calculate p-value} +\item{n_times}{(page/rank) number of permutation iterations to calculate +p-value} \item{rbp_p}{(rank) fractional binarization threshold (default = 0.99)} @@ -55,7 +57,8 @@ runSpatialEnrich( \item{max_block}{number of lines to process together (default = 20e6)} -\item{top_percentage}{(hyper) percentage of cells that will be considered to have gene expression with matrix binarization} +\item{top_percentage}{(hyper) percentage of cells that will be considered +to have gene expression with matrix binarization} \item{output_enrichment}{how to return enrichment output} @@ -69,7 +72,8 @@ runSpatialEnrich( Giotto object or enrichment results if return_gobject = FALSE } \description{ -Function to calculate gene signature enrichment scores per spatial position using an enrichment test. +Function to calculate gene signature enrichment scores per +spatial position using an enrichment test. } \details{ For details see the individual functions: diff --git a/man/runUMAPprojection.Rd b/man/runUMAPprojection.Rd index 2812cf25e..dd6e74f1d 100644 --- a/man/runUMAPprojection.Rd +++ b/man/runUMAPprojection.Rd @@ -25,7 +25,7 @@ runUMAPprojection( spread = 5, set_seed = TRUE, seed_number = 1234, - verbose = T, + verbose = TRUE, toplevel_params = 2, ... ) @@ -84,7 +84,8 @@ giotto object with updated UMAP dimension reduction run UMAP on subset and project on the rest } \details{ -See \code{\link[uwot]{umap}} for more information about these and other parameters. +See \code{\link[uwot]{umap}} for more information about these and +other parameters. \itemize{ \item Input for UMAP dimension reduction can be another dimension reduction (default = 'pca') \item To use gene expression as input set dim_reduction_to_use = NULL diff --git a/man/runWNN.Rd b/man/runWNN.Rd index d6eb0aefa..4a542f892 100644 --- a/man/runWNN.Rd +++ b/man/runWNN.Rd @@ -45,7 +45,9 @@ runWNN( \item{verbose}{be verbose} } \value{ -A Giotto object with integrated UMAP (integrated.umap) within the dimension_reduction slot and Leiden clusters (integrated_leiden_clus) in the cellular metadata. +A Giotto object with integrated UMAP (integrated.umap) within the +dimension_reduction slot and Leiden clusters (integrated_leiden_clus) in the +cellular metadata. } \description{ Multi omics integration with WNN diff --git a/man/run_spatial_sim_tests_multi.Rd b/man/run_spatial_sim_tests_multi.Rd index b050d1ccb..b44e0c6ae 100644 --- a/man/run_spatial_sim_tests_multi.Rd +++ b/man/run_spatial_sim_tests_multi.Rd @@ -27,6 +27,7 @@ run_spatial_sim_tests_multi( ) } \description{ -runs all spatial tests for multiple probabilities and repetitions +runs all spatial tests for multiple probabilities and +repetitions } \keyword{internal} diff --git a/man/runtSNE.Rd b/man/runtSNE.Rd index 1702d1811..c4861b2e6 100644 --- a/man/runtSNE.Rd +++ b/man/runtSNE.Rd @@ -72,7 +72,8 @@ giotto object with updated tSNE dimension recuction run tSNE } \details{ -See \code{\link[Rtsne]{Rtsne}} for more information about these and other parameters. \cr +See \code{\link[Rtsne]{Rtsne}} for more information about these and +other parameters. \cr \itemize{ \item Input for tSNE dimension reduction can be another dimension reduction (default = 'pca') \item To use gene expression as input set dim_reduction_to_use = NULL diff --git a/man/sampling_sp_genes.Rd b/man/sampling_sp_genes.Rd index 459aa6b0e..3d542da7e 100644 --- a/man/sampling_sp_genes.Rd +++ b/man/sampling_sp_genes.Rd @@ -19,8 +19,11 @@ sampling_sp_genes(clust, sample_rate = 2, target = 500, seed = 10) function to select a set of spatial genes } \details{ -This function samples a subset of spatial genes among different clusters, with size n = target. -Number of samples from each cluster denpends on the relative proportion of each cluster. -Changing from equal size by setting sample_rate = 1 to with exact proportion of each cluster by setting sample_rate = +Inf +This function samples a subset of spatial genes among different clusters, +with size n = target. +Number of samples from each cluster denpends on the relative proportion of +each cluster. +Changing from equal size by setting sample_rate = 1 to with exact proportion +of each cluster by setting sample_rate = +Inf } \keyword{internal} diff --git a/man/screePlot.Rd b/man/screePlot.Rd index e6cc731a6..c5763e5ea 100644 --- a/man/screePlot.Rd +++ b/man/screePlot.Rd @@ -72,12 +72,13 @@ screePlot( ggplot object for scree method } \description{ -identify significant principal components (PCs) using an screeplot (a.k.a. elbowplot) +identify significant principal components (PCs) using an +screeplot (a.k.a. elbowplot) } \details{ Screeplot works by plotting the explained variance of each - individual PC in a barplot allowing you to identify which PC provides a significant - contribution (a.k.a 'elbow method'). \cr - Screeplot will use an available pca object, based on the parameter 'name', or it will - create it if it's not available (see \code{\link{runPCA}}) + individual PC in a barplot allowing you to identify which PC provides a + significant contribution (a.k.a 'elbow method'). \cr + Screeplot will use an available pca object, based on the parameter 'name', + or it will create it if it's not available (see \code{\link{runPCA}}) } diff --git a/man/selectPatternGenes.Rd b/man/selectPatternGenes.Rd index 39ba9d89c..ad2df7b6f 100644 --- a/man/selectPatternGenes.Rd +++ b/man/selectPatternGenes.Rd @@ -27,7 +27,8 @@ selectPatternGenes( \item{min_neg_cor}{Minimum negative correlation score to include a gene.} -\item{return_top_selection}{only return selection based on correlation criteria (boolean)} +\item{return_top_selection}{only return selection based on correlation +criteria (boolean)} } \value{ Data.table with genes associated with selected dimension (PC). diff --git a/man/showCellProportionSwitchedPie.Rd b/man/showCellProportionSwitchedPie.Rd index afced4e45..b1361c045 100644 --- a/man/showCellProportionSwitchedPie.Rd +++ b/man/showCellProportionSwitchedPie.Rd @@ -21,6 +21,8 @@ showCellProportionSwitchedPie( showCellProportionSwitchedPie } \details{ -Creates a pie chart showing how many cells switched clusters after annotation resizing. -The function showPolygonSizeInfluence() must have been run on the Giotto Object for this function to run. +Creates a pie chart showing how many cells switched clusters after +annotation resizing. +The function showPolygonSizeInfluence() must have been run on the Giotto +Object for this function to run. } diff --git a/man/showCellProportionSwitchedSanKey.Rd b/man/showCellProportionSwitchedSanKey.Rd index f33a1b0f9..7c8893c64 100644 --- a/man/showCellProportionSwitchedSanKey.Rd +++ b/man/showCellProportionSwitchedSanKey.Rd @@ -12,11 +12,13 @@ showCellProportionSwitchedSanKey( ) } \arguments{ -\item{gobject}{giotto object which contains metadata for both spat_unit and alt_spat_unit} +\item{gobject}{giotto object which contains metadata for both spat_unit and +alt_spat_unit} \item{spat_unit}{spatial unit} -\item{alt_spat_unit}{alternative spatial unit which stores data after resizing annotations} +\item{alt_spat_unit}{alternative spatial unit which stores data after +resizing annotations} \item{feat_type}{feature type} } diff --git a/man/showPattern.Rd b/man/showPattern.Rd index 576367ab3..e8915f083 100644 --- a/man/showPattern.Rd +++ b/man/showPattern.Rd @@ -23,8 +23,10 @@ showPattern(gobject, spatPatObj, ...) \item{\code{show_plot}}{show plot} \item{\code{return_plot}}{return ggplot object} \item{\code{save_plot}}{directly save the plot [boolean]} - \item{\code{save_param}}{list of saving parameters, see \code{\link{showSaveParameters}}} - \item{\code{default_save_name}}{default save name for saving, don't change, change save_name in save_param} + \item{\code{save_param}}{list of saving parameters, see +\code{\link{showSaveParameters}}} + \item{\code{default_save_name}}{default save name for saving, don't change, +change save_name in save_param} }} } \value{ diff --git a/man/showPattern2D.Rd b/man/showPattern2D.Rd index 7fb3438c6..8bfb2917d 100644 --- a/man/showPattern2D.Rd +++ b/man/showPattern2D.Rd @@ -11,7 +11,7 @@ showPattern2D( trim = c(0.02, 0.98), background_color = "white", grid_border_color = "grey", - show_legend = T, + show_legend = TRUE, point_size = 1, show_plot = NULL, return_plot = NULL, @@ -43,9 +43,11 @@ showPattern2D( \item{save_plot}{directly save the plot [boolean]} -\item{save_param}{list of saving parameters, see \code{\link{showSaveParameters}}} +\item{save_param}{list of saving parameters, see +\code{\link{showSaveParameters}}} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} } \value{ ggplot diff --git a/man/showPattern3D.Rd b/man/showPattern3D.Rd index 589fc1729..6d5354e91 100644 --- a/man/showPattern3D.Rd +++ b/man/showPattern3D.Rd @@ -11,7 +11,7 @@ showPattern3D( trim = c(0.02, 0.98), background_color = "white", grid_border_color = "grey", - show_legend = T, + show_legend = TRUE, point_size = 1, axis_scale = c("cube", "real", "custom"), custom_ratio = NULL, @@ -58,9 +58,11 @@ showPattern3D( \item{save_plot}{directly save the plot [boolean]} -\item{save_param}{list of saving parameters, see \code{\link{showSaveParameters}}} +\item{save_param}{list of saving parameters, see +\code{\link{showSaveParameters}}} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} } \value{ plotly diff --git a/man/showPatternGenes.Rd b/man/showPatternGenes.Rd index dfd346784..82dfb2a1d 100644 --- a/man/showPatternGenes.Rd +++ b/man/showPatternGenes.Rd @@ -32,7 +32,8 @@ showPatternGenes( \item{point_size}{size of points} -\item{return_DT}{if TRUE, it will return the data.table used to generate the plots} +\item{return_DT}{if TRUE, it will return the data.table used to generate +the plots} \item{show_plot}{show plot} @@ -40,9 +41,11 @@ showPatternGenes( \item{save_plot}{directly save the plot [boolean]} -\item{save_param}{list of saving parameters, see \code{\link{showSaveParameters}}} +\item{save_param}{list of saving parameters, see +\code{\link{showSaveParameters}}} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} } \value{ ggplot diff --git a/man/showPolygonSizeInfluence.Rd b/man/showPolygonSizeInfluence.Rd index 175d5aa21..c3562b6a8 100644 --- a/man/showPolygonSizeInfluence.Rd +++ b/man/showPolygonSizeInfluence.Rd @@ -19,11 +19,13 @@ showPolygonSizeInfluence( \item{spat_unit}{spatial unit} -\item{alt_spat_unit}{alternaitve spatial unit which represents resized polygon data} +\item{alt_spat_unit}{alternaitve spatial unit which represents resized +polygon data} \item{feat_type}{feature type} -\item{clus_name}{name of cluster column in cell_metadata for given spat_unit and alt_spat_unit, i.e. "kmeans"} +\item{clus_name}{name of cluster column in cell_metadata for given spat_unit +and alt_spat_unit, i.e. "kmeans"} \item{return_plot}{logical. whether to return the plot object} @@ -41,11 +43,12 @@ Compares cell metadata from spat_unit-feat_type pairs as provided. New columns, resize_switch and cluster_interaction, will be created within cell_metadata for spat_unit-feat_type. -These new columns will describe if a given cell switched cluster number when resized. +These new columns will describe if a given cell switched cluster number when +resized. If the same amount of clusters exist for spat_unit-feat_type and alt_spat_unit-feat_type, then clusters are determined to be corresponding based on % overlap in cell_IDs in each cluster. -Otherwise, multiple clusters from the spatial unit feature type pair are condensed -to align with the smaller number of clusters and ensure overlap. +Otherwise, multiple clusters from the spatial unit feature type pair are +condensed to align with the smaller number of clusters and ensure overlap. } diff --git a/man/showSpatialCorFeats.Rd b/man/showSpatialCorFeats.Rd index 4d683c4ac..857968661 100644 --- a/man/showSpatialCorFeats.Rd +++ b/man/showSpatialCorFeats.Rd @@ -29,9 +29,11 @@ showSpatialCorFeats( \item{min_expr_cor}{filter on minimum single-cell expression correlation} -\item{min_cor_diff}{filter on minimum correlation difference (spatial vs expression)} +\item{min_cor_diff}{filter on minimum correlation difference +(spatial vs expression)} -\item{min_rank_diff}{filter on minimum correlation rank difference (spatial vs expression)} +\item{min_rank_diff}{filter on minimum correlation rank difference +(spatial vs expression)} \item{show_top_feats}{show top features per gene} } diff --git a/man/showSpatialCorGenes.Rd b/man/showSpatialCorGenes.Rd index ba78421fa..f44934e5b 100644 --- a/man/showSpatialCorGenes.Rd +++ b/man/showSpatialCorGenes.Rd @@ -29,9 +29,11 @@ showSpatialCorGenes( \item{min_expr_cor}{filter on minimum single-cell expression correlation} -\item{min_cor_diff}{filter on minimum correlation difference (spatial vs expression)} +\item{min_cor_diff}{filter on minimum correlation difference +(spatial vs expression)} -\item{min_rank_diff}{filter on minimum correlation rank difference (spatial vs expression)} +\item{min_rank_diff}{filter on minimum correlation rank difference +(spatial vs expression)} \item{show_top_genes}{show top genes per gene} } diff --git a/man/signPCA.Rd b/man/signPCA.Rd index 2e1f6132e..eb0469bd1 100644 --- a/man/signPCA.Rd +++ b/man/signPCA.Rd @@ -84,12 +84,14 @@ ggplot object for scree method and maxtrix of p-values for jackstraw identify significant prinicipal components (PCs) } \details{ -Two different methods can be used to assess the number of relevant or significant - prinicipal components (PC's). \cr +Two different methods can be used to assess the number of relevant +or significant prinicipal components (PC's). \cr 1. Screeplot works by plotting the explained variance of each - individual PC in a barplot allowing you to identify which PC provides a significant + individual PC in a barplot allowing you to identify which PC provides a + significant contribution (a.k.a. 'elbow method'). \cr - 2. The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} function. By - systematically permuting genes it identifies robust, and thus significant, PCs. + 2. The Jackstraw method uses the \code{\link[jackstraw]{permutationPA}} + function. By systematically permuting genes it identifies robust, and thus + significant, PCs. \cr } diff --git a/man/silhouetteRank.Rd b/man/silhouetteRank.Rd index e3502d2ec..16dc29975 100644 --- a/man/silhouetteRank.Rd +++ b/man/silhouetteRank.Rd @@ -33,9 +33,13 @@ silhouetteRank( data.table with spatial scores } \description{ -Previously: calculate_spatial_genes_python. This method computes a silhouette score per gene based on the -spatial distribution of two partitions of cells (expressed L1, and non-expressed L0). -Here, rather than L2 Euclidean norm, it uses a rank-transformed, exponentially weighted +Previously: calculate_spatial_genes_python. This method +computes a silhouette score per gene based on the +spatial distribution of two partitions of cells +(expressed L1, and non-expressed L0). +Here, rather than L2 Euclidean norm, it uses a rank-transformed, +exponentially weighted function to represent the local physical distance between two cells. -New multi aggregator implementation can be found at \code{\link{silhouetteRankTest}} +New multi aggregator implementation can be found at +\code{\link{silhouetteRankTest}} } diff --git a/man/silhouetteRankTest.Rd b/man/silhouetteRankTest.Rd index 1741690e6..6413f937c 100644 --- a/man/silhouetteRankTest.Rd +++ b/man/silhouetteRankTest.Rd @@ -48,5 +48,6 @@ silhouetteRankTest( data.table with spatial scores } \description{ -Multi parameter aggregator version of \code{\link{silhouetteRank}} +Multi parameter aggregator version of +\code{\link{silhouetteRank}} } diff --git a/man/simulateOneGenePatternGiottoObject.Rd b/man/simulateOneGenePatternGiottoObject.Rd index 6e7030c16..a937ca88f 100644 --- a/man/simulateOneGenePatternGiottoObject.Rd +++ b/man/simulateOneGenePatternGiottoObject.Rd @@ -25,7 +25,8 @@ simulateOneGenePatternGiottoObject( \item{gene_name}{selected gene} -\item{spatial_prob}{probability for a high expressing gene value to be part of the spatial pattern} +\item{spatial_prob}{probability for a high expressing gene value to be +part of the spatial pattern} \item{gradient_direction}{direction of gradient} @@ -36,7 +37,8 @@ simulateOneGenePatternGiottoObject( \item{\dots}{additional parameters for (re-)normalizing} } \value{ -Reprocessed Giotto object for which one gene has a forced spatial pattern +Reprocessed Giotto object for which one gene has a forced +spatial pattern } \description{ Create a simulated spatial pattern for one selected gnee diff --git a/man/spark.Rd b/man/spark.Rd index 81dc89014..79f57f9e1 100644 --- a/man/spark.Rd +++ b/man/spark.Rd @@ -35,11 +35,13 @@ spark( \item{num_core}{number of cores to use} -\item{covariates}{The covariates in experiments, i.e. confounding factors/batch effect. Column name of giotto cell metadata.} +\item{covariates}{The covariates in experiments, i.e. confounding +factors/batch effect. Column name of giotto cell metadata.} \item{return_object}{type of result to return (data.table or spark object)} -\item{\dots}{Additional parameters to the \code{\link[SPARK]{spark.vc}} function} +\item{\dots}{Additional parameters to the \code{\link[SPARK]{spark.vc}} +function} } \value{ data.table with SPARK spatial genes results or the SPARK object @@ -48,7 +50,8 @@ data.table with SPARK spatial genes results or the SPARK object Compute spatially expressed genes with SPARK method } \details{ -This function is a wrapper for the method implemented in the SPARK package: +This function is a wrapper for the method implemented in the +SPARK package: \itemize{ \item{1. CreateSPARKObject }{create a SPARK object from a Giotto object} \item{2. spark.vc }{ Fits the count-based spatial model to estimate the parameters, diff --git a/man/spatCellCellcom.Rd b/man/spatCellCellcom.Rd index 3751aee4d..166323526 100644 --- a/man/spatCellCellcom.Rd +++ b/man/spatCellCellcom.Rd @@ -35,7 +35,8 @@ spatCellCellcom( \item{spat_unit}{spatial unit} -\item{spatial_network_name}{spatial network to use for identifying interacting cells} +\item{spatial_network_name}{spatial network to use for identifying +interacting cells} \item{cluster_column}{cluster column with cell type information} @@ -51,9 +52,11 @@ spatCellCellcom( \item{log2FC_addendum}{addendum to add when calculating log2FC} -\item{min_observations}{minimum number of interactions needed to be considered} +\item{min_observations}{minimum number of interactions needed to be +considered} -\item{detailed}{provide more detailed information (random variance and z-score)} +\item{detailed}{provide more detailed information +(random variance and z-score)} \item{adjust_method}{which method to adjust p-values} @@ -70,15 +73,19 @@ spatCellCellcom( \item{verbose}{verbose} } \value{ -Cell-Cell communication scores for feature pairs based on spatial interaction +Cell-Cell communication scores for feature pairs based on spatial +interaction } \description{ -Spatial Cell-Cell communication scores based on spatial expression of interacting cells +Spatial Cell-Cell communication scores based on spatial +expression of interacting cells } \details{ -Statistical framework to identify if pairs of genes (such as ligand-receptor combinations) -are expressed at higher levels than expected based on a reshuffled null distribution -of feature expression values in cells that are spatially in proximity to eachother.. +Statistical framework to identify if pairs of genes +(such as ligand-receptor combinations) +are expressed at higher levels than expected based on a reshuffled null +distribution of feature expression values in cells that are spatially in +proximity to each other. \itemize{ \item{LR_comb:}{Pair of ligand and receptor} \item{lig_cell_type:}{ cell type to assess expression level of ligand } diff --git a/man/spatCellCellcomSpots.Rd b/man/spatCellCellcomSpots.Rd index e318f6d26..6c219f697 100644 --- a/man/spatCellCellcomSpots.Rd +++ b/man/spatCellCellcomSpots.Rd @@ -36,7 +36,8 @@ spatCellCellcomSpots( \item{ave_celltype_exp}{Matrix with average expression per cell type} -\item{spatial_network_name}{spatial network to use for identifying interacting cells} +\item{spatial_network_name}{spatial network to use for identifying +interacting cells} \item{cluster_column}{cluster column with cell type information} @@ -46,15 +47,18 @@ spatCellCellcomSpots( \item{feature_set_2}{second specific feature set from feature pairs} -\item{min_observations}{minimum number of interactions needed to be considered} +\item{min_observations}{minimum number of interactions needed to be +considered} \item{expression_values}{(e.g. 'normalized', 'scaled', 'custom')} -\item{detailed}{provide more detailed information (random variance and z-score)} +\item{detailed}{provide more detailed information +(random variance and z-score)} \item{adjust_method}{which method to adjust p-values} -\item{adjust_target}{adjust multiple hypotheses at the cell or feature level} +\item{adjust_target}{adjust multiple hypotheses at the cell or feature +level} \item{do_parallel}{run calculations in parallel with mclapply} @@ -67,15 +71,18 @@ spatCellCellcomSpots( \item{verbose}{verbose (e.g. 'a little', 'a lot', 'none')} } \value{ -Cell-Cell communication scores for feature pairs based on spatial interaction +Cell-Cell communication scores for feature pairs based on spatial +interaction } \description{ -Spatial Cell-Cell communication scores based on spatial expression of interacting cells at spots resolution +Spatial Cell-Cell communication scores based on spatial +expression of interacting cells at spots resolution } \details{ -Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -are expressed at higher levels than expected based on a reshuffled null distribution -of feature expression values in cells that are spatially in proximity to eachother.. +Statistical framework to identify if pairs of features +(such as ligand-receptor combinations) are expressed at higher levels than +expected based on a reshuffled null distribution of feature expression +values in cells that are spatially in proximity to each other. \itemize{ \item{LR_comb:}{Pair of ligand and receptor} \item{lig_cell_type:}{ cell type to assess expression level of ligand } diff --git a/man/spat_fisher_exact.Rd b/man/spat_fisher_exact.Rd index 6934211f7..24a3cdb78 100644 --- a/man/spat_fisher_exact.Rd +++ b/man/spat_fisher_exact.Rd @@ -6,12 +6,12 @@ \alias{.spat_fish_func_dt} \title{Fisher exact test} \usage{ -.spat_fish_func(feat, bin_matrix, spat_mat, calc_hub = F, hub_min_int = 3) +.spat_fish_func(feat, bin_matrix, spat_mat, calc_hub = FALSE, hub_min_int = 3) .spat_fish_func_dt( bin_matrix_DTm, spat_netw_min, - calc_hub = F, + calc_hub = FALSE, hub_min_int = 3, cores = NA ) diff --git a/man/spat_odds_ratio.Rd b/man/spat_odds_ratio.Rd index 6b3184b6d..f3302a4f9 100644 --- a/man/spat_odds_ratio.Rd +++ b/man/spat_odds_ratio.Rd @@ -11,7 +11,7 @@ .spat_or_func_dt( bin_matrix_DTm, spat_netw_min, - calc_hub = F, + calc_hub = FALSE, hub_min_int = 3, cores = NA ) diff --git a/man/spatialAEH.Rd b/man/spatialAEH.Rd index 7c7c3a071..744102ef9 100644 --- a/man/spatialAEH.Rd +++ b/man/spatialAEH.Rd @@ -48,5 +48,6 @@ An updated giotto object Compute spatial variable genes with spatialDE method } \details{ -This function is a wrapper for the SpatialAEH method implemented in the ... +This function is a wrapper for the SpatialAEH method +implemented in the ... } diff --git a/man/spatialAutoCor.Rd b/man/spatialAutoCor.Rd index 634664ef9..ca84a5483 100644 --- a/man/spatialAutoCor.Rd +++ b/man/spatialAutoCor.Rd @@ -60,8 +60,8 @@ spatialAutoCorLocal( \item{method}{method of autocorrelation. See details (default = 'moran')} -\item{data_to_use}{if using data from gobject, whether to test using expression -('expression') or cell metadata ('cell_meta')} +\item{data_to_use}{if using data from gobject, whether to test using +expression ('expression') or cell metadata ('cell_meta')} \item{expression_values}{name of expression information to use} @@ -69,21 +69,22 @@ spatialAutoCorLocal( \item{spatial_network_to_use}{spatial network to use} -\item{wm_method}{type of weight matrix to generate from spatial network if no -weight matrix is found attached to the spatial network} +\item{wm_method}{type of weight matrix to generate from spatial network +if no weight matrix is found attached to the spatial network} \item{wm_name}{name of attached weight matrix to use} -\item{node_values}{alternative method of directly supplying a set of node values} +\item{node_values}{alternative method of directly supplying a set of node +values} -\item{weight_matrix}{alternative method of directly supplying a spatial weight -matrix} +\item{weight_matrix}{alternative method of directly supplying a spatial +weight matrix} \item{test_method}{method to test values for significance (default is no testing)} -\item{mc_nsim}{when \code{test_method = 'monte_carlo'} this is number of simulations -to perform} +\item{mc_nsim}{when \code{test_method = 'monte_carlo'} this is number of +simulations to perform} \item{cor_name}{name to assign the results in global autocorrelation output} @@ -91,19 +92,21 @@ to perform} \item{verbose}{be verbose} -\item{enrich_name}{name to assign local autocorrelation spatial enrichment results} +\item{enrich_name}{name to assign local autocorrelation spatial enrichment +results} \item{output}{'spatEnrObj' or 'data.table' metadata in the giotto object or as a data.table} } \description{ -Find spatial autocorrelation. Note that \code{spatialAutoCorGlobal} -will return values as a data.table instead of appending information to the gobject. -\code{spatialAutoCorLocal} will append the results as a spatial enrichment object -by default. \cr -If providing external data using either the \code{node_values} and/or \code{weight_matrix} -params, the order of values provided should be the same as the ordering of the -columns and rows of the weight matrix. +Find spatial autocorrelation. Note that +\code{spatialAutoCorGlobal} will return values as a data.table instead of +appending information to the gobject. +\code{spatialAutoCorLocal} will append the results as a spatial enrichment +object by default. \cr +If providing external data using either the \code{node_values} and/or +\code{weight_matrix} params, the order of values provided should be the +same as the ordering of the columns and rows of the weight matrix. } \details{ \strong{Global Methods:} @@ -124,6 +127,7 @@ columns and rows of the weight matrix. \itemize{ \item \code{spatialAutoCorGlobal()}: Global autocorrelation (single value returned) -\item \code{spatialAutoCorLocal()}: Local autocorrelation (values generated for each spatial ID) +\item \code{spatialAutoCorLocal()}: Local autocorrelation +(values generated for each spatial ID) }} diff --git a/man/spatialDE.Rd b/man/spatialDE.Rd index 91ec9b013..995dce02d 100644 --- a/man/spatialDE.Rd +++ b/man/spatialDE.Rd @@ -49,9 +49,11 @@ spatialDE( \item{save_plot}{directly save the plot [boolean]} -\item{save_param}{list of saving parameters, see \code{\link{showSaveParameters}}} +\item{save_param}{list of saving parameters, see +\code{\link{showSaveParameters}}} -\item{default_save_name}{default save name for saving, don't change, change save_name in save_param} +\item{default_save_name}{default save name for saving, don't change, +change save_name in save_param} } \value{ a list of data.frames with results and plot (optional) @@ -60,6 +62,7 @@ a list of data.frames with results and plot (optional) Compute spatial variable genes with spatialDE method } \details{ -This function is a wrapper for the SpatialDE method originally implemented +This function is a wrapper for the SpatialDE method originally +implemented in python. See publication \doi{10.1038/nmeth.4636} } diff --git a/man/spatialSplitCluster.Rd b/man/spatialSplitCluster.Rd index 8bbcdce33..f7caadb11 100644 --- a/man/spatialSplitCluster.Rd +++ b/man/spatialSplitCluster.Rd @@ -33,13 +33,14 @@ Split cluster annotations based on a spatial network } \examples{ library(Giotto) -g = GiottoData::loadGiottoMini('vizgen') -activeSpatUnit(g) = 'aggregate' -spatPlot2D(g, cell_color = 'leiden_clus') +g <- GiottoData::loadGiottoMini("vizgen") +activeSpatUnit(g) <- "aggregate" +spatPlot2D(g, cell_color = "leiden_clus") -g = spatialSplitCluster(g, - cluster_col = 'leiden_clus', - split_clus_name = 'new') +g <- spatialSplitCluster(g, + cluster_col = "leiden_clus", + split_clus_name = "new" +) # don't show legend since there are too many categories generated -spatPlot2D(g, cell_color = 'new', show_legend = FALSE) +spatPlot2D(g, cell_color = "new", show_legend = FALSE) } diff --git a/man/spdepAutoCorr.Rd b/man/spdepAutoCorr.Rd index 8ee31b7b7..afb0ab84e 100644 --- a/man/spdepAutoCorr.Rd +++ b/man/spdepAutoCorr.Rd @@ -19,7 +19,8 @@ spdepAutoCorr( \item{gobject}{Input a Giotto object.} \item{method}{Specify a method name to compute auto correlation. -Available methods include \code{"geary.test", "lee.test", "lm.morantest","moran.test"}.} +Available methods include +\code{"geary.test", "lee.test", "lm.morantest","moran.test"}.} \item{spat_unit}{spatial unit} @@ -27,7 +28,8 @@ Available methods include \code{"geary.test", "lee.test", "lm.morantest","moran. \item{expression_values}{expression values to use, default = normalized} -\item{spatial_network_to_use}{spatial network to use, default = spatial_network} +\item{spatial_network_to_use}{spatial network to use, +default = spatial_network} \item{return_gobject}{if FALSE, results are returned as data.table. If TRUE, values will be appended to feature metadata} diff --git a/man/specificCellCellcommunicationScores.Rd b/man/specificCellCellcommunicationScores.Rd index 4a4f0dd64..71e436cb1 100644 --- a/man/specificCellCellcommunicationScores.Rd +++ b/man/specificCellCellcommunicationScores.Rd @@ -25,7 +25,7 @@ specificCellCellcommunicationScores( adjust_target = c("feats", "cells"), set_seed = FALSE, seed_number = 1234, - verbose = T + verbose = TRUE ) } \arguments{ @@ -35,7 +35,8 @@ specificCellCellcommunicationScores( \item{spat_unit}{spatial unit} -\item{spatial_network_name}{spatial network to use for identifying interacting cells} +\item{spatial_network_name}{spatial network to use for identifying +interacting cells} \item{cluster_column}{cluster column with cell type information} @@ -55,9 +56,11 @@ specificCellCellcommunicationScores( \item{log2FC_addendum}{addendum to add when calculating log2FC} -\item{min_observations}{minimum number of interactions needed to be considered} +\item{min_observations}{minimum number of interactions needed to be +considered} -\item{detailed}{provide more detailed information (random variance and z-score)} +\item{detailed}{provide more detailed information +(random variance and z-score)} \item{adjust_method}{which method to adjust p-values} @@ -70,15 +73,19 @@ specificCellCellcommunicationScores( \item{verbose}{verbose} } \value{ -Cell-Cell communication scores for feature pairs based on spatial interaction +Cell-Cell communication scores for feature pairs based on spatial +interaction } \description{ -Specific Cell-Cell communication scores based on spatial expression of interacting cells +Specific Cell-Cell communication scores based on spatial +expression of interacting cells } \details{ -Statistical framework to identify if pairs of features (such as ligand-receptor combinations) -are expressed at higher levels than expected based on a reshuffled null distribution -of feature expression values in cells that are spatially in proximity to eachother. +Statistical framework to identify if pairs of features +(such as ligand-receptor combinations) +are expressed at higher levels than expected based on a reshuffled null +distribution of feature expression values in cells that are spatially in +proximity to eachother. \itemize{ \item{LR_comb:}{Pair of ligand and receptor} \item{lig_cell_type:}{ cell type to assess expression level of ligand } diff --git a/man/subClusterCells.Rd b/man/subClusterCells.Rd index 92a1fb7c1..bd11bf637 100644 --- a/man/subClusterCells.Rd +++ b/man/subClusterCells.Rd @@ -10,13 +10,13 @@ subClusterCells( cluster_method = c("leiden", "louvain_community", "louvain_multinet"), cluster_column = NULL, selected_clusters = NULL, - hvg_param = list(reverse_log_scale = T, difference_in_cov = 1, expression_values = + hvg_param = list(reverse_log_scale = TRUE, difference_in_cov = 1, expression_values = "normalized"), hvg_min_perc_cells = 5, hvg_mean_expr_det = 1, use_all_genes_as_hvg = FALSE, min_nr_of_hvg = 5, - pca_param = list(expression_values = "normalized", scale_unit = T), + pca_param = list(expression_values = "normalized", scale_unit = TRUE), nn_param = list(dimensions_to_use = 1:20), k_neighbors = 10, resolution = 1, @@ -27,7 +27,7 @@ subClusterCells( nn_network_to_use = "sNN", network_name = "sNN.pca", return_gobject = TRUE, - verbose = T + verbose = TRUE ) } \arguments{ @@ -45,11 +45,14 @@ subClusterCells( \item{hvg_min_perc_cells}{threshold for detection in min percentage of cells} -\item{hvg_mean_expr_det}{threshold for mean expression level in cells with detection} +\item{hvg_mean_expr_det}{threshold for mean expression level in cells with +detection} -\item{use_all_genes_as_hvg}{forces all genes to be HVG and to be used as input for PCA} +\item{use_all_genes_as_hvg}{forces all genes to be HVG and to be used as +input for PCA} -\item{min_nr_of_hvg}{minimum number of HVG, or all genes will be used as input for PCA} +\item{min_nr_of_hvg}{minimum number of HVG, or all genes will be used as +input for PCA} \item{pca_param}{parameters for runPCA} @@ -93,6 +96,7 @@ The systematic steps are: } } \seealso{ -\code{\link{.doLouvainCluster_multinet}}, \code{\link{.doLouvainCluster_community}} +\code{\link{.doLouvainCluster_multinet}}, +\code{\link{.doLouvainCluster_community}} and @seealso \code{\link{doLeidenCluster}} } diff --git a/man/trendSceek.Rd b/man/trendSceek.Rd index a20d15a27..b6cbbb0a9 100644 --- a/man/trendSceek.Rd +++ b/man/trendSceek.Rd @@ -29,11 +29,14 @@ trendSceek( \item{subset_genes}{subset of genes to run trendsceek on} -\item{nrand}{An integer specifying the number of random resamplings of the mark distribution as to create the null-distribution.} +\item{nrand}{An integer specifying the number of random resamplings of the +mark distribution as to create the null-distribution.} -\item{ncores}{An integer specifying the number of cores to be used by BiocParallel} +\item{ncores}{An integer specifying the number of cores to be used by +BiocParallel} -\item{\dots}{Additional parameters to the \code{\link[trendsceek]{trendsceek_test}} function} +\item{\dots}{Additional parameters to the +\code{\link[trendsceek]{trendsceek_test}} function} } \value{ data.frame with trendsceek spatial genes results @@ -42,6 +45,7 @@ data.frame with trendsceek spatial genes results Compute spatial variable genes with trendsceek method } \details{ -This function is a wrapper for the trendsceek_test method implemented in the trendsceek package +This function is a wrapper for the trendsceek_test method +implemented in the trendsceek package Publication: \doi{10.1038/nmeth.4634} } diff --git a/man/viewHMRFresults_V2.Rd b/man/viewHMRFresults_V2.Rd index b204f6335..2af120df6 100644 --- a/man/viewHMRFresults_V2.Rd +++ b/man/viewHMRFresults_V2.Rd @@ -63,6 +63,8 @@ viewHMRFresults_V2( function to view HMRF results with multiple betas } \details{ -This function plots spatial map of HMRF domain clusters for multiple beta with the name (hmrf_name), -matching the first part of the cell meta column names with HMRF clusters (for example name of ‘hmrf1 k=8 b=0.00’ is ‘hmrf1’) +This function plots spatial map of HMRF domain clusters for multiple beta +with the name (hmrf_name), +matching the first part of the cell meta column names with HMRF clusters +(for example name of ‘hmrf1 k=8 b=0.00’ is ‘hmrf1’) } diff --git a/man/visium_micron_scalefactor.Rd b/man/visium_micron_scalefactor.Rd index ed4a889f7..9c9f93949 100644 --- a/man/visium_micron_scalefactor.Rd +++ b/man/visium_micron_scalefactor.Rd @@ -8,7 +8,8 @@ .visium_micron_scale(json_scalefactors) } \arguments{ -\item{json_scalefactors}{list of scalefactors from .visium_read_scalefactors()} +\item{json_scalefactors}{list of scalefactors from +.visium_read_scalefactors()} } \value{ scale factor for converting pixel to micron diff --git a/man/writeHMRFresults.Rd b/man/writeHMRFresults.Rd index 942123902..95e9c5dd0 100644 --- a/man/writeHMRFresults.Rd +++ b/man/writeHMRFresults.Rd @@ -9,7 +9,7 @@ writeHMRFresults( HMRFoutput, k = NULL, betas_to_view = NULL, - print_command = F + print_command = FALSE ) } \arguments{ diff --git a/man/write_giotto_viewer_annotation.Rd b/man/write_giotto_viewer_annotation.Rd index 236e66bf8..ef5c40f51 100644 --- a/man/write_giotto_viewer_annotation.Rd +++ b/man/write_giotto_viewer_annotation.Rd @@ -21,6 +21,7 @@ write_giotto_viewer_annotation( write a .txt and .annot file for the selection annotation } \description{ -write out factor-like annotation data from a giotto object for the Viewer +write out factor-like annotation data from a giotto object for +the Viewer } \keyword{internal} diff --git a/tests/testthat.R b/tests/testthat.R index b328cae37..876cafa51 100644 --- a/tests/testthat.R +++ b/tests/testthat.R @@ -11,10 +11,10 @@ library(Giotto) # additional needed packages suppressWarnings({ - # needed for reading .gz files. - if(!require(R.utils)) { - install.packages('R.utils', repos = 'http://cran.us.r-project.org') - } + # needed for reading .gz files. + if (!require(R.utils)) { + install.packages("R.utils", repos = "http://cran.us.r-project.org") + } }) diff --git a/tests/testthat/test_dataImports.R b/tests/testthat/test_dataImports.R index f1f5cf4fd..5adf3421c 100644 --- a/tests/testthat/test_dataImports.R +++ b/tests/testthat/test_dataImports.R @@ -9,19 +9,18 @@ # ------------------------------------ test_that("Expression matrix is read correctly", { + # getSpatialDataset + GiottoData::getSpatialDataset(dataset = "scRNA_mouse_brain", directory = paste0(getwd(), "/testdata/")) - # getSpatialDataset - GiottoData::getSpatialDataset(dataset = "scRNA_mouse_brain", directory = paste0(getwd(), "/testdata/")) + # readExprMatrix + expr_mat <- readExprMatrix(paste0(getwd(), "/testdata/brain_sc_expression_matrix.txt.gz")) - # readExprMatrix - expr_mat <- readExprMatrix(paste0(getwd(), "/testdata/brain_sc_expression_matrix.txt.gz")) + expect_s4_class(expr_mat, "dgCMatrix") + expect_equal(expr_mat@Dim, c(27998, 8039)) - expect_s4_class(expr_mat, "dgCMatrix") - expect_equal(expr_mat@Dim, c(27998, 8039)) - - # check a few genes - expect_equal(expr_mat@Dimnames[[1]][20], "Sgcz") - expect_equal(expr_mat@Dimnames[[1]][50], 'Zfp804a') + # check a few genes + expect_equal(expr_mat@Dimnames[[1]][20], "Sgcz") + expect_equal(expr_mat@Dimnames[[1]][50], "Zfp804a") }) # get10Xmatrix_h5 @@ -36,10 +35,9 @@ test_that("Expression matrix is read correctly", { # ----------------------------- # remove files after testing if (file.exists("./testdata/brain_sc_expression_matrix.txt.gz")) { - unlink("./testdata/brain_sc_expression_matrix.txt.gz") + unlink("./testdata/brain_sc_expression_matrix.txt.gz") } if (file.exists("./testdata/brain_sc_metadata.csv")) { - unlink("./testdata/brain_sc_metadata.csv") + unlink("./testdata/brain_sc_metadata.csv") } - diff --git a/tests/testthat/test_dimreduction.R b/tests/testthat/test_dimreduction.R index ddc86e8b1..ba679a77d 100644 --- a/tests/testthat/test_dimreduction.R +++ b/tests/testthat/test_dimreduction.R @@ -7,60 +7,56 @@ g <- GiottoData::loadGiottoMini("visium") # pca #### test_that("pca is calculated", { - rlang::local_options(lifecycle_verbosity = "quiet") - # remove dim reductions - g@dimension_reduction <- NULL - g@nn_network <- NULL + rlang::local_options(lifecycle_verbosity = "quiet") + # remove dim reductions + g@dimension_reduction <- NULL + g@nn_network <- NULL - g <- suppressWarnings(runPCA(g)) - x <- getDimReduction(g) - - checkmate::expect_class(x, "dimObj") - checkmate::expect_numeric(x$eigenvalues) - checkmate::expect_matrix(x$loadings) + g <- suppressWarnings(runPCA(g)) + x <- getDimReduction(g) + checkmate::expect_class(x, "dimObj") + checkmate::expect_numeric(x$eigenvalues) + checkmate::expect_matrix(x$loadings) }) test_that("projection pca is calculated", { - rlang::local_options(lifecycle_verbosity = "quiet") - # remove dim reductions - g@dimension_reduction <- NULL - g@nn_network <- NULL + rlang::local_options(lifecycle_verbosity = "quiet") + # remove dim reductions + g@dimension_reduction <- NULL + g@nn_network <- NULL - g <- suppressWarnings(runPCAprojection(g)) - x <- getDimReduction(g, name = "pca.projection") + g <- suppressWarnings(runPCAprojection(g)) + x <- getDimReduction(g, name = "pca.projection") - checkmate::expect_class(x, "dimObj") - checkmate::expect_numeric(x$eigenvalues) - checkmate::expect_matrix(x$loadings) + checkmate::expect_class(x, "dimObj") + checkmate::expect_numeric(x$eigenvalues) + checkmate::expect_matrix(x$loadings) }) # UMAP #### test_that("umap is calculated", { - rlang::local_options(lifecycle_verbosity = "quiet") - g <- setDimReduction(g, NULL, - spat_unit = "cell", - feat_type = "rna", - reduction = "cells", - reduction_method = "umap", - name = "umap") - expect_false("umap" %in% list_dim_reductions(g)$name) - - g <- runUMAP(g) - - expect_true("umap" %in% list_dim_reductions(g)$name) - - u <- getDimReduction(g, - spat_unit = "cell", - feat_type = "rna", - reduction = "cells", - reduction_method = "umap", - name = "umap") - - checkmate::expect_class(u, "dimObj") + rlang::local_options(lifecycle_verbosity = "quiet") + g <- setDimReduction(g, NULL, + spat_unit = "cell", + feat_type = "rna", + reduction = "cells", + reduction_method = "umap", + name = "umap" + ) + expect_false("umap" %in% list_dim_reductions(g)$name) + + g <- runUMAP(g) + + expect_true("umap" %in% list_dim_reductions(g)$name) + + u <- getDimReduction(g, + spat_unit = "cell", + feat_type = "rna", + reduction = "cells", + reduction_method = "umap", + name = "umap" + ) + + checkmate::expect_class(u, "dimObj") }) - - - - - diff --git a/tests/testthat/test_merFISH.R b/tests/testthat/test_merFISH.R index 646af91bc..1e5d5b6b7 100644 --- a/tests/testthat/test_merFISH.R +++ b/tests/testthat/test_merFISH.R @@ -4,18 +4,20 @@ rlang::local_options(lifecycle_verbosity = "quiet") # data setup (warning about non-existing dir expected) suppressWarnings( - GiottoData::getSpatialDataset(dataset = 'merfish_preoptic', directory = paste0(getwd(), '/testdata/')) + GiottoData::getSpatialDataset(dataset = "merfish_preoptic", directory = paste0(getwd(), "/testdata/")) ) -expr_path = './testdata/merFISH_3D_data_expression.txt.gz' -loc_path = './testdata/merFISH_3D_data_cell_locations.txt' -meta_path = './testdata/merFISH_3D_metadata.txt' +expr_path <- "./testdata/merFISH_3D_data_expression.txt.gz" +loc_path <- "./testdata/merFISH_3D_data_cell_locations.txt" +meta_path <- "./testdata/merFISH_3D_metadata.txt" # CREATE GIOTTO OBJECT -object <- createGiottoObject(expression = expr_path, - spatial_locs = loc_path, - verbose = FALSE) +object <- createGiottoObject( + expression = expr_path, + spatial_locs = loc_path, + verbose = FALSE +) ### TESTS FOR MERFISH MOUSE HYPOTHALMIC PREOPTIC REGION DATASET @@ -23,85 +25,80 @@ object <- createGiottoObject(expression = expr_path, test_that("Object initialization creates expected Giotto object", { - - # S4 object of class "giotto" - expect_s4_class(object, "giotto") - - # gobject contains S4 object of class "dgCMatrix" containing raw expression - expect_s4_class(slot(object@expression[["cell"]][["rna"]][["raw"]], "exprMat"), "dgCMatrix") - expect_true(all(slot(object@expression[["cell"]][["rna"]][["raw"]], "exprMat")@Dim == c(161, 73655))) - - # gobject contains S4 object "spatLocsObj" of dimensions 73655 x 4 containing spatial locations - st = getSpatialLocations(object, spat_unit = 'cell', name = 'raw', output = 'spatLocsObj') - expect_identical(st@coordinates, object@spatial_locs[["cell"]][["raw"]]@coordinates) - expect_s4_class(object@spatial_locs[["cell"]][["raw"]], "spatLocsObj") - expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimx"]], 73655) - expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimy"]], 73655) - expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimz"]], 73655) - expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["cell_ID"]], 73655) - + # S4 object of class "giotto" + expect_s4_class(object, "giotto") + + # gobject contains S4 object of class "dgCMatrix" containing raw expression + expect_s4_class(slot(object@expression[["cell"]][["rna"]][["raw"]], "exprMat"), "dgCMatrix") + expect_true(all(slot(object@expression[["cell"]][["rna"]][["raw"]], "exprMat")@Dim == c(161, 73655))) + + # gobject contains S4 object "spatLocsObj" of dimensions 73655 x 4 containing spatial locations + st <- getSpatialLocations(object, spat_unit = "cell", name = "raw", output = "spatLocsObj") + expect_identical(st@coordinates, object@spatial_locs[["cell"]][["raw"]]@coordinates) + expect_s4_class(object@spatial_locs[["cell"]][["raw"]], "spatLocsObj") + expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimx"]], 73655) + expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimy"]], 73655) + expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimz"]], 73655) + expect_length(object@spatial_locs[["cell"]][["raw"]]@coordinates[["cell_ID"]], 73655) }) # READ IN METADATA -metadata = data.table::fread(meta_path) -object = addCellMetadata(object, new_metadata = metadata$layer_ID, vector_name = 'layer_ID') -object = addCellMetadata(object, new_metadata = metadata$orig_cell_types, vector_name = 'orig_cell_types') +metadata <- data.table::fread(meta_path) +object <- addCellMetadata(object, new_metadata = metadata$layer_ID, vector_name = "layer_ID") +object <- addCellMetadata(object, new_metadata = metadata$orig_cell_types, vector_name = "orig_cell_types") test_that("Cell metadata are read and added to Giotto object", { - # metadata col names - expect_named(metadata, c("orig_cell_types", "layer_ID")) + # metadata col names + expect_named(metadata, c("orig_cell_types", "layer_ID")) - # metadata length matches number of preexisting cell_IDs - expect_equal(nrow(metadata), length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["cell_ID"]])) - - # metadata length/types after added to object - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["layer_ID"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["layer_ID"]], "integer") - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["orig_cell_types"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["orig_cell_types"]], "character") + # metadata length matches number of preexisting cell_IDs + expect_equal(nrow(metadata), length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["cell_ID"]])) + # metadata length/types after added to object + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["layer_ID"]], 73655) + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["layer_ID"]], "integer") + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["orig_cell_types"]], 73655) + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["orig_cell_types"]], "character") }) # FILTER GIOTTO OBJECT -filtered_object <- filterGiotto(gobject = object, - expression_values = "raw", - expression_threshold = 1, - feat_det_in_min_cells = 50, - min_det_feats_per_cell = 50, - verbose = FALSE) +filtered_object <- filterGiotto( + gobject = object, + expression_values = "raw", + expression_threshold = 1, + feat_det_in_min_cells = 50, + min_det_feats_per_cell = 50, + verbose = FALSE +) test_that("Data in filtered object is expected size", { - - # filtered object expression values have expected dimensions - expect_true(all(slot(filtered_object@expression[["cell"]][["rna"]][["raw"]],'exprMat')@Dim == c(153, 17814))) - - # filtered object spatial locations have expected length - expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimx"]], 17814) - expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimy"]], 17814) - expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimz"]], 17814) - expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["cell_ID"]], 17814) - - # filtered object metadata has expected length - expect_length(slot(filtered_object@cell_metadata[["cell"]][["rna"]],'metaDT')[["layer_ID"]], 17814) - expect_length(slot(filtered_object@cell_metadata[["cell"]][["rna"]],'metaDT')[["orig_cell_types"]], 17814) - + # filtered object expression values have expected dimensions + expect_true(all(slot(filtered_object@expression[["cell"]][["rna"]][["raw"]], "exprMat")@Dim == c(153, 17814))) + + # filtered object spatial locations have expected length + expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimx"]], 17814) + expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimy"]], 17814) + expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["sdimz"]], 17814) + expect_length(filtered_object@spatial_locs[["cell"]][["raw"]]@coordinates[["cell_ID"]], 17814) + + # filtered object metadata has expected length + expect_length(slot(filtered_object@cell_metadata[["cell"]][["rna"]], "metaDT")[["layer_ID"]], 17814) + expect_length(slot(filtered_object@cell_metadata[["cell"]][["rna"]], "metaDT")[["orig_cell_types"]], 17814) }) # NORMALIZE GIOTTO OBJECT object <- normalizeGiotto(gobject = object, scalefactor = 10000, verbose = F) test_that("Normalized data added to giotto object", { + # gobject now also contains S4 object of class "dgCMatrix" containing normalized expression + expect_s4_class(slot(object@expression[["cell"]][["rna"]][["normalized"]], "exprMat"), "dgCMatrix") + expect_true(all(slot(object@expression[["cell"]][["rna"]][["normalized"]], "exprMat")@Dim == c(161, 73655))) - # gobject now also contains S4 object of class "dgCMatrix" containing normalized expression - expect_s4_class(slot(object@expression[["cell"]][["rna"]][["normalized"]], 'exprMat'), "dgCMatrix") - expect_true(all(slot(object@expression[["cell"]][["rna"]][["normalized"]], 'exprMat')@Dim == c(161, 73655))) - - # gobject now also contains S4 object of class "dgeMatrix" containing scaled expression - expect_s4_class(slot(object@expression[["cell"]][["rna"]][["scaled"]], 'exprMat'), "dgeMatrix") - expect_true(all(slot(object@expression[["cell"]][["rna"]][["scaled"]], 'exprMat')@Dim == c(161, 73655))) - + # gobject now also contains S4 object of class "dgeMatrix" containing scaled expression + expect_s4_class(slot(object@expression[["cell"]][["rna"]][["scaled"]], "exprMat"), "dgeMatrix") + expect_true(all(slot(object@expression[["cell"]][["rna"]][["scaled"]], "exprMat")@Dim == c(161, 73655))) }) @@ -109,170 +106,170 @@ test_that("Normalized data added to giotto object", { object <- addStatistics(gobject = object) test_that("Feature and cell statistics are added to giotto object", { - - # gobject cell metadata contains nr_feats, perc_feats, total_expr - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["nr_feats"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["nr_feats"]], "integer") - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["perc_feats"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["perc_feats"]], "double") - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["total_expr"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["total_expr"]], "double") - - # gobject feat metadata contains nr_cells, perc_cells, total_expr, mean_expr, mean_expr_det - expect_length(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["nr_cells"]], 161) - expect_type(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["nr_cells"]], "integer") - expect_length(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["perc_cells"]], 161) - expect_type(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["perc_cells"]], "double") - expect_length(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["total_expr"]], 161) - expect_type(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["total_expr"]], "double") - expect_length(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["mean_expr"]], 161) - expect_type(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["mean_expr"]], "double") - expect_length(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["mean_expr_det"]], 161) - expect_type(slot(object@feat_metadata[["cell"]][["rna"]], 'metaDT')[["mean_expr_det"]], "double") - + # gobject cell metadata contains nr_feats, perc_feats, total_expr + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["nr_feats"]], 73655) + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["nr_feats"]], "integer") + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["perc_feats"]], 73655) + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["perc_feats"]], "double") + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["total_expr"]], 73655) + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["total_expr"]], "double") + + # gobject feat metadata contains nr_cells, perc_cells, total_expr, mean_expr, mean_expr_det + expect_length(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["nr_cells"]], 161) + expect_type(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["nr_cells"]], "integer") + expect_length(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["perc_cells"]], 161) + expect_type(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["perc_cells"]], "double") + expect_length(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["total_expr"]], 161) + expect_type(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["total_expr"]], "double") + expect_length(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["mean_expr"]], 161) + expect_type(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["mean_expr"]], "double") + expect_length(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["mean_expr_det"]], 161) + expect_type(slot(object@feat_metadata[["cell"]][["rna"]], "metaDT")[["mean_expr_det"]], "double") }) # ADJUST EXPRESSION VALUES FOR BATCH/COVARIATES -object <- adjustGiottoMatrix(gobject = object, expression_values = c('normalized'), - batch_columns = NULL, covariate_columns = c('layer_ID'), - return_gobject = TRUE, - update_slot = c('custom')) +object <- adjustGiottoMatrix( + gobject = object, expression_values = c("normalized"), + batch_columns = NULL, covariate_columns = c("layer_ID"), + return_gobject = TRUE, + update_slot = c("custom") +) test_that("Adjusted values are created in 'custom' slot", { - - # expression now also contains custom object of class double - expect_type(slot(object@expression[["cell"]][["rna"]][["custom"]], 'exprMat')[1], "double") - expect_equal(nrow(slot(object@expression[["cell"]][["rna"]][["custom"]], 'exprMat')), 161) - expect_equal(ncol(slot(object@expression[["cell"]][["rna"]][["custom"]], 'exprMat')), 73655) - + # expression now also contains custom object of class double + expect_type(slot(object@expression[["cell"]][["rna"]][["custom"]], "exprMat")[1], "double") + expect_equal(nrow(slot(object@expression[["cell"]][["rna"]][["custom"]], "exprMat")), 161) + expect_equal(ncol(slot(object@expression[["cell"]][["rna"]][["custom"]], "exprMat")), 73655) }) # RUN DIMENSION REDUCTION -object <- suppressWarnings(runPCA(gobject = object, - genes_to_use = NULL, - scale_unit = FALSE, - center = TRUE, - verbose = FALSE)) +object <- suppressWarnings(runPCA( + gobject = object, + genes_to_use = NULL, + scale_unit = FALSE, + center = TRUE, + verbose = FALSE +)) test_that("PCA S4 object is created as expected", { - - # s4 object of class "dimObj" - expect_s4_class(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]], "dimObj") - - # coordinates double of dims 73655 x 100 - expect_equal(nrow(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates), 73655) - expect_equal(ncol(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates), 100) - - # test a few arbitrary coordinates - expect_equal(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates[5], - -5.612915, - tolerance = 1*10^-3) - expect_equal(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates[10], - -24.66273, - tolerance = 1*10^-3) - + # s4 object of class "dimObj" + expect_s4_class(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]], "dimObj") + + # coordinates double of dims 73655 x 100 + expect_equal(nrow(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates), 73655) + expect_equal(ncol(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates), 100) + + # test a few arbitrary coordinates + expect_equal(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates[5], + -5.612915, + tolerance = 1 * 10^-3 + ) + expect_equal(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["pca"]][["pca"]]@coordinates[10], + -24.66273, + tolerance = 1 * 10^-3 + ) }) # UMAP object <- runUMAP(object, dimensions_to_use = 1:8, n_components = 3, n_threads = 4, verbose = FALSE) test_that("UMAP S4 object is created as expected", { - - # s4 object of class "dimObj" - expect_s4_class(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]], "dimObj") - - # coordinates double of dims 73655 x 3 - expect_equal(nrow(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates), 73655) - expect_equal(ncol(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates), 3) - - # test a few arbitrary coordinates - show_failure(expect_equal( - !!object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates["cell_20","Dim.1"], - 2.354577, - tolerance = 1*10^-1) - ) - show_failure(expect_equal( - !!object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates["cell_40","Dim.1"], - -10.2164, - tolerance = 1*10^-1) - ) - + # s4 object of class "dimObj" + expect_s4_class(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]], "dimObj") + + # coordinates double of dims 73655 x 3 + expect_equal(nrow(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates), 73655) + expect_equal(ncol(object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates), 3) + + # test a few arbitrary coordinates + show_failure(expect_equal( + !!object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates["cell_20", "Dim.1"], + 2.354577, + tolerance = 1 * 10^-1 + )) + show_failure(expect_equal( + !!object@dimension_reduction[["cells"]][["cell"]][["rna"]][["umap"]][["umap"]]@coordinates["cell_40", "Dim.1"], + -10.2164, + tolerance = 1 * 10^-1 + )) }) # CREATE NETWORK object <- createNearestNetwork(gobject = object, dimensions_to_use = 1:8, k = 15, verbose = FALSE) test_that("sNN S3 object is created as expected", { - - # igraph s3 object - expect_s3_class(slot(object@nn_network[["cell"]][["rna"]][["sNN"]][["sNN.pca"]], 'igraph'), "igraph") - + # igraph s3 object + expect_s3_class(slot(object@nn_network[["cell"]][["rna"]][["sNN"]][["sNN.pca"]], "igraph"), "igraph") }) # LEIDEN CLUSTERING object <- doLeidenCluster( - gobject = object, - resolution = 0.2, - n_iterations = 200, - name = 'leiden_0.2' + gobject = object, + resolution = 0.2, + n_iterations = 200, + name = "leiden_0.2" ) test_that("New clusters are added to cell metadata", { + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["leiden_0.2"]], 73655) - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["leiden_0.2"]], 73655) - - # test a few cluster assignments - expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["leiden_0.2"]][10], 5) - expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["leiden_0.2"]][80], 4) - + # test a few cluster assignments + expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["leiden_0.2"]][10], 5) + expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["leiden_0.2"]][80], 4) }) # CELL TYPE MARKER GENE DETECTION -markers = findMarkers_one_vs_all(gobject = object, - method = 'gini', - expression_values = 'normalized', - cluster_column = 'leiden_0.2', - min_feats = 1, rank_score = 2, - verbose = FALSE) +markers <- findMarkers_one_vs_all( + gobject = object, + method = "gini", + expression_values = "normalized", + cluster_column = "leiden_0.2", + min_feats = 1, rank_score = 2, + verbose = FALSE +) test_that("Cell type markers are detected", { - - # markers col names - expect_named(markers, c("feats", "cluster", "expression", "expression_gini", - "detection", "detection_gini", "expression_rank", - "detection_rank", "comb_score", "comb_rank")) - - # number of markers - expect_equal(nrow(markers), 585) - + # markers col names + expect_named(markers, c( + "feats", "cluster", "expression", "expression_gini", + "detection", "detection_gini", "expression_rank", + "detection_rank", "comb_score", "comb_rank" + )) + + # number of markers + expect_equal(nrow(markers), 585) }) # CLUSTER ANNOTATION -selected_genes = c('Myh11', 'Klf4', 'Fn1', 'Cd24a', 'Cyr61', 'Nnat', - 'Trh', 'Selplg', 'Pou3f2', 'Aqp4', 'Traf4', - 'Pdgfra', 'Opalin', 'Mbp', 'Ttyh2', 'Fezf1', - 'Cbln1', 'Slc17a6', 'Scg2', 'Isl1', 'Gad1') -cluster_order = c(6, 11, 9, 12, 4, 8, 7, 5, 13, 3, 1, 2, 10) -clusters_cell_types_hypo = c('Inhibitory', 'Inhibitory', 'Excitatory', - 'Astrocyte','OD Mature', 'Endothelial', - 'OD Mature', 'OD Immature', 'Ependymal', 'Ambiguous', - 'Endothelial', 'Microglia', 'OD Mature') +selected_genes <- c( + "Myh11", "Klf4", "Fn1", "Cd24a", "Cyr61", "Nnat", + "Trh", "Selplg", "Pou3f2", "Aqp4", "Traf4", + "Pdgfra", "Opalin", "Mbp", "Ttyh2", "Fezf1", + "Cbln1", "Slc17a6", "Scg2", "Isl1", "Gad1" +) +cluster_order <- c(6, 11, 9, 12, 4, 8, 7, 5, 13, 3, 1, 2, 10) +clusters_cell_types_hypo <- c( + "Inhibitory", "Inhibitory", "Excitatory", + "Astrocyte", "OD Mature", "Endothelial", + "OD Mature", "OD Immature", "Ependymal", "Ambiguous", + "Endothelial", "Microglia", "OD Mature" +) -names(clusters_cell_types_hypo) = as.character(sort(cluster_order)) -object = annotateGiotto(gobject = object, annotation_vector = clusters_cell_types_hypo, - cluster_column = 'leiden_0.2', name = 'cell_types') +names(clusters_cell_types_hypo) <- as.character(sort(cluster_order)) +object <- annotateGiotto( + gobject = object, annotation_vector = clusters_cell_types_hypo, + cluster_column = "leiden_0.2", name = "cell_types" +) test_that("Cell type annotations are added to cell metadata", { + expect_type(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["cell_types"]], "character") + expect_length(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["cell_types"]], 73655) - expect_type(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["cell_types"]], "character") - expect_length(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["cell_types"]], 73655) - - # check a few annotations - expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["cell_types"]][5], "Inhibitory") - expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], 'metaDT')[["cell_types"]][250], "OD Immature") - + # check a few annotations + expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["cell_types"]][5], "Inhibitory") + expect_equal(slot(object@cell_metadata[["cell"]][["rna"]], "metaDT")[["cell_types"]][250], "OD Immature") }) @@ -281,4 +278,3 @@ test_that("Cell type annotations are added to cell metadata", { if (file.exists(expr_path)) unlink(expr_path) if (file.exists(loc_path)) unlink(loc_path) if (file.exists(meta_path)) unlink(meta_path) - diff --git a/tests/testthat/test_processing.R b/tests/testthat/test_processing.R index 5d9240ef9..e946eb758 100644 --- a/tests/testthat/test_processing.R +++ b/tests/testthat/test_processing.R @@ -11,16 +11,16 @@ g <- GiottoData::loadGiottoMini("visium") # normalize #### test_that("expression is normalized", { - rlang::local_options(lifecycle_verbosity = "quiet") - # remove normalized and scaled expr matrices - g <- setExpression(g, NULL, spat_unit = "cell", feat_type = "rna", name = "normalized", verbose = FALSE) - g <- setExpression(g, NULL, spat_unit = "cell", feat_type = "rna", name = "scaled", verbose = FALSE) + rlang::local_options(lifecycle_verbosity = "quiet") + # remove normalized and scaled expr matrices + g <- setExpression(g, NULL, spat_unit = "cell", feat_type = "rna", name = "normalized", verbose = FALSE) + g <- setExpression(g, NULL, spat_unit = "cell", feat_type = "rna", name = "scaled", verbose = FALSE) - expect_false(any(c("normalized", "scaled") %in% list_expression(g)$name)) + expect_false(any(c("normalized", "scaled") %in% list_expression(g)$name)) - g <- normalizeGiotto(g, verbose = FALSE) + g <- normalizeGiotto(g, verbose = FALSE) - expect_true(any(c("normalized", "scaled") %in% list_expression(g)$name)) + expect_true(any(c("normalized", "scaled") %in% list_expression(g)$name)) }) @@ -28,57 +28,56 @@ test_that("expression is normalized", { # hvf #### test_that("highly variable gene detection", { - rlang::local_options(lifecycle_verbosity = "quiet") - # reset feature metadata - g <- setFeatureMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) - expect_false("hvf" %in% names(fDataDT(g))) + rlang::local_options(lifecycle_verbosity = "quiet") + # reset feature metadata + g <- setFeatureMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) + expect_false("hvf" %in% names(fDataDT(g))) - g <- calculateHVF(g) - expect_true("hvf" %in% names(fDataDT(g))) + g <- calculateHVF(g) + expect_true("hvf" %in% names(fDataDT(g))) - # character "yes" and "no" expected - checkmate::expect_character(fDataDT(g)$hvf) + # character "yes" and "no" expected + checkmate::expect_character(fDataDT(g)$hvf) }) test_that("highly variable gene detections - pearson resid", { - rlang::local_options(lifecycle_verbosity = "quiet") - # reset feature metadata - g <- setFeatureMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) - expect_false(any(c("var", "hvf") %in% names(fDataDT(g)))) - - g <- normalizeGiotto( - gobject = g, - feat_type = 'rna', - scalefactor = 5000, - verbose = FALSE, - norm_methods = 'pearson_resid', - update_slot = 'pearson' - ) - - g <- calculateHVF( - g, - method = 'var_p_resid', - var_threshold = 7, - expression_values = 'pearson' - ) - - expect_true(any(c("var", "hvf") %in% names(fDataDT(g)))) - - # character "yes" and "no" expected - checkmate::expect_numeric(fDataDT(g)$var) - checkmate::expect_character(fDataDT(g)$hvf) + rlang::local_options(lifecycle_verbosity = "quiet") + # reset feature metadata + g <- setFeatureMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) + expect_false(any(c("var", "hvf") %in% names(fDataDT(g)))) + + g <- normalizeGiotto( + gobject = g, + feat_type = "rna", + scalefactor = 5000, + verbose = FALSE, + norm_methods = "pearson_resid", + update_slot = "pearson" + ) + + g <- calculateHVF( + g, + method = "var_p_resid", + var_threshold = 7, + expression_values = "pearson" + ) + + expect_true(any(c("var", "hvf") %in% names(fDataDT(g)))) + + # character "yes" and "no" expected + checkmate::expect_numeric(fDataDT(g)$var) + checkmate::expect_character(fDataDT(g)$hvf) }) # statistics #### test_that("statistics are added", { - rlang::local_options(lifecycle_verbosity = "quiet") - # reset cell metadata - g <- setCellMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) - expect_false(any(c("nr_feats", "perc_feats", "total_expr") %in% names(pDataDT(g)))) + rlang::local_options(lifecycle_verbosity = "quiet") + # reset cell metadata + g <- setCellMetadata(g, NULL, spat_unit = "cell", feat_type = "rna", verbose = FALSE) + expect_false(any(c("nr_feats", "perc_feats", "total_expr") %in% names(pDataDT(g)))) - g <- addStatistics(g) - expect_true(any(c("nr_feats", "perc_feats", "total_expr") %in% names(pDataDT(g)))) + g <- addStatistics(g) + expect_true(any(c("nr_feats", "perc_feats", "total_expr") %in% names(pDataDT(g)))) }) - diff --git a/tests/testthat/test_visium.R b/tests/testthat/test_visium.R index e7c3df814..1e3fd6f14 100644 --- a/tests/testthat/test_visium.R +++ b/tests/testthat/test_visium.R @@ -1,40 +1,39 @@ - # download files for Adult Mouse Olfactory Bulb (Space Ranger v1.3.0) #### # https://www.10xgenomics.com/datasets/adult-mouse-olfactory-bulb-1-standard # chosen for the small size of the dataset urls <- c( - "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_filtered_feature_bc_matrix.h5", - "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_filtered_feature_bc_matrix.tar.gz", - "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_raw_feature_bc_matrix.h5", - "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_raw_feature_bc_matrix.tar.gz", - "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_spatial.tar.gz" + "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_filtered_feature_bc_matrix.h5", + "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_filtered_feature_bc_matrix.tar.gz", + "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_raw_feature_bc_matrix.h5", + "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_raw_feature_bc_matrix.tar.gz", + "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_spatial.tar.gz" ) -datadir <- paste0(getwd(), '/testdata/vis_1_3_0') +datadir <- paste0(getwd(), "/testdata/vis_1_3_0") if (!dir.exists(datadir)) dir.create(datadir) # download files lapply( - urls, - function(url) { - myfilename <- basename(url) - mydestfile <- file.path(datadir, myfilename) - utils::download.file(url = url, destfile = mydestfile, quiet = TRUE) - } + urls, + function(url) { + myfilename <- basename(url) + mydestfile <- file.path(datadir, myfilename) + utils::download.file(url = url, destfile = mydestfile, quiet = TRUE) + } ) manifest <- file.path(datadir, basename(urls)) names(manifest) <- gsub("Visium_Mouse_Olfactory_Bulb_", "", basename(manifest)) lapply( - manifest[c( - "filtered_feature_bc_matrix.tar.gz", - "raw_feature_bc_matrix.tar.gz", - "spatial.tar.gz" - )], - utils::untar, - exdir = datadir + manifest[c( + "filtered_feature_bc_matrix.tar.gz", + "raw_feature_bc_matrix.tar.gz", + "spatial.tar.gz" + )], + utils::untar, + exdir = datadir ) options("giotto.use_conda" = FALSE) @@ -45,68 +44,68 @@ options("giotto.use_conda" = FALSE) # create: directory #### ext_to_rounded_num <- function(e) { - num <- round(ext(e)[]) - names(num) <- NULL - return(num) + num <- round(ext(e)[]) + names(num) <- NULL + return(num) } test_that("visium create dir raw is working", { - # unfiltered (all spots) - g_nofil <- suppressWarnings(createGiottoVisiumObject( - visium_dir = datadir, - expr_data = "raw", - verbose = FALSE - )) - expect_true(validObject(g_nofil)) - # meta - expect_true(nrow(pDataDT(g_nofil)) == 4992) - expect_true(nrow(fDataDT(g_nofil)) == 19500) - expect_true("ENSMUSG00000051951" %in% fDataDT(g_nofil)$feat_ID) - # expression - expect_true(identical(dim(getExpression(g_nofil)), c(19500L, 4992L))) - # spatlocs - expect_equal(nrow(getSpatialLocations(g_nofil)), 4992L) - # image - i_nofil <- getGiottoImage(g_nofil, image_type = "largeImage") - expect_s4_class(i_nofil, "giottoLargeImage") - expect_identical(dim(i_nofil), c(2000, 2000, 3)) - expect_equal(ext_to_rounded_num(i_nofil), c(0, 1e4, -1e4, 0)) - - - # polys - p_nofil <- getPolygonInfo(g_nofil, return_giottoPolygon = TRUE) - expect_s4_class(p_nofil, "giottoPolygon") - expect_true(nrow(p_nofil) == 4992) - expect_equal(ext_to_rounded_num(p_nofil), c(1057, 8726, -8705, -1431)) + # unfiltered (all spots) + g_nofil <- suppressWarnings(createGiottoVisiumObject( + visium_dir = datadir, + expr_data = "raw", + verbose = FALSE + )) + expect_true(validObject(g_nofil)) + # meta + expect_true(nrow(pDataDT(g_nofil)) == 4992) + expect_true(nrow(fDataDT(g_nofil)) == 19500) + expect_true("ENSMUSG00000051951" %in% fDataDT(g_nofil)$feat_ID) + # expression + expect_true(identical(dim(getExpression(g_nofil)), c(19500L, 4992L))) + # spatlocs + expect_equal(nrow(getSpatialLocations(g_nofil)), 4992L) + # image + i_nofil <- getGiottoImage(g_nofil, image_type = "largeImage") + expect_s4_class(i_nofil, "giottoLargeImage") + expect_identical(dim(i_nofil), c(2000, 2000, 3)) + expect_equal(ext_to_rounded_num(i_nofil), c(0, 1e4, -1e4, 0)) + + + # polys + p_nofil <- getPolygonInfo(g_nofil, return_giottoPolygon = TRUE) + expect_s4_class(p_nofil, "giottoPolygon") + expect_true(nrow(p_nofil) == 4992) + expect_equal(ext_to_rounded_num(p_nofil), c(1057, 8726, -8705, -1431)) }) test_that("visium create dir filtered is working", { - # filtered (in_tissue spots only) - g_fil <- suppressWarnings(createGiottoVisiumObject( - visium_dir = datadir, - expr_data = "filter", - verbose = FALSE - )) - expect_true(validObject(g_fil)) - # meta - expect_true(nrow(pDataDT(g_fil)) == 1185) - expect_true(nrow(fDataDT(g_fil)) == 19332) - expect_true("ENSMUSG00000051951" %in% fDataDT(g_fil)$feat_ID) - # expression - expect_true(identical(dim(getExpression(g_fil)), c(19332L, 1185L))) - # spatlocs - expect_equal(nrow(getSpatialLocations(g_fil)), 1185L) - # image - i_fil <- getGiottoImage(g_fil, image_type = "largeImage") - expect_s4_class(i_fil, "giottoLargeImage") - expect_identical(dim(i_fil), c(2000, 2000, 3)) - expect_equal(ext_to_rounded_num(i_fil), c(0, 1e4, -1e4, 0)) - - # polys - p_fil <- getPolygonInfo(g_fil, return_giottoPolygon = TRUE) - expect_s4_class(p_fil, "giottoPolygon") - expect_true(nrow(p_fil) == 1185) - expect_equal(ext_to_rounded_num(p_fil), c(3917, 7244, -7625, -2737)) + # filtered (in_tissue spots only) + g_fil <- suppressWarnings(createGiottoVisiumObject( + visium_dir = datadir, + expr_data = "filter", + verbose = FALSE + )) + expect_true(validObject(g_fil)) + # meta + expect_true(nrow(pDataDT(g_fil)) == 1185) + expect_true(nrow(fDataDT(g_fil)) == 19332) + expect_true("ENSMUSG00000051951" %in% fDataDT(g_fil)$feat_ID) + # expression + expect_true(identical(dim(getExpression(g_fil)), c(19332L, 1185L))) + # spatlocs + expect_equal(nrow(getSpatialLocations(g_fil)), 1185L) + # image + i_fil <- getGiottoImage(g_fil, image_type = "largeImage") + expect_s4_class(i_fil, "giottoLargeImage") + expect_identical(dim(i_fil), c(2000, 2000, 3)) + expect_equal(ext_to_rounded_num(i_fil), c(0, 1e4, -1e4, 0)) + + # polys + p_fil <- getPolygonInfo(g_fil, return_giottoPolygon = TRUE) + expect_s4_class(p_fil, "giottoPolygon") + expect_true(nrow(p_fil) == 1185) + expect_equal(ext_to_rounded_num(p_fil), c(3917, 7244, -7625, -2737)) }) @@ -115,78 +114,77 @@ test_that("visium create dir filtered is working", { # create: H5 #### test_that("visium create H5 raw is working", { - g_nofil <- suppressWarnings(createGiottoVisiumObject( - h5_visium_path = manifest["raw_feature_bc_matrix.h5"], - h5_gene_ids = "symbols", - h5_json_scalefactors_path = file.path( - datadir, "spatial", "scalefactors_json.json"), - h5_image_png_path = file.path( - datadir, "spatial", "tissue_lowres_image.png"), - h5_tissue_positions_path = file.path( - datadir, "spatial", "tissue_positions_list.csv") - )) - - expect_true(validObject(g_nofil)) - # meta - expect_true(nrow(pDataDT(g_nofil)) == 4992) - expect_true(nrow(fDataDT(g_nofil)) == 19500) - expect_true("Xkr4" %in% fDataDT(g_nofil)$feat_ID) - # expression - expect_true(identical(dim(getExpression(g_nofil)), c(19500L, 4992L))) - # spatlocs - expect_equal(nrow(getSpatialLocations(g_nofil)), 4992L) - # image - i_nofil <- getGiottoImage(g_nofil, image_type = "largeImage") - expect_s4_class(i_nofil, "giottoLargeImage") - expect_identical(dim(i_nofil), c(600, 600, 3)) - expect_equal(ext_to_rounded_num(i_nofil), c(0, 1e4, -1e4, 0)) - - - # polys - p_nofil <- getPolygonInfo(g_nofil, return_giottoPolygon = TRUE) - expect_s4_class(p_nofil, "giottoPolygon") - expect_true(nrow(p_nofil) == 4992) - expect_equal(ext_to_rounded_num(p_nofil), c(1057, 8726, -8705, -1431)) + g_nofil <- suppressWarnings(createGiottoVisiumObject( + h5_visium_path = manifest["raw_feature_bc_matrix.h5"], + h5_gene_ids = "symbols", + h5_json_scalefactors_path = file.path( + datadir, "spatial", "scalefactors_json.json" + ), + h5_image_png_path = file.path( + datadir, "spatial", "tissue_lowres_image.png" + ), + h5_tissue_positions_path = file.path( + datadir, "spatial", "tissue_positions_list.csv" + ) + )) + + expect_true(validObject(g_nofil)) + # meta + expect_true(nrow(pDataDT(g_nofil)) == 4992) + expect_true(nrow(fDataDT(g_nofil)) == 19500) + expect_true("Xkr4" %in% fDataDT(g_nofil)$feat_ID) + # expression + expect_true(identical(dim(getExpression(g_nofil)), c(19500L, 4992L))) + # spatlocs + expect_equal(nrow(getSpatialLocations(g_nofil)), 4992L) + # image + i_nofil <- getGiottoImage(g_nofil, image_type = "largeImage") + expect_s4_class(i_nofil, "giottoLargeImage") + expect_identical(dim(i_nofil), c(600, 600, 3)) + expect_equal(ext_to_rounded_num(i_nofil), c(0, 1e4, -1e4, 0)) + + + # polys + p_nofil <- getPolygonInfo(g_nofil, return_giottoPolygon = TRUE) + expect_s4_class(p_nofil, "giottoPolygon") + expect_true(nrow(p_nofil) == 4992) + expect_equal(ext_to_rounded_num(p_nofil), c(1057, 8726, -8705, -1431)) }) test_that("visium create H5 filtered is working", { - g_fil <- suppressWarnings(createGiottoVisiumObject( - h5_visium_path = manifest["filtered_feature_bc_matrix.h5"], - h5_gene_ids = "ensembl", - h5_json_scalefactors_path = file.path( - datadir, "spatial", "scalefactors_json.json"), - h5_image_png_path = file.path( - datadir, "spatial", "tissue_lowres_image.png"), - h5_tissue_positions_path = file.path( - datadir, "spatial", "tissue_positions_list.csv") - )) - - expect_true(validObject(g_fil)) - # meta - expect_true(nrow(pDataDT(g_fil)) == 1185) - expect_true(nrow(fDataDT(g_fil)) == 19332) - expect_true("ENSMUSG00000051951" %in% fDataDT(g_fil)$feat_ID) - # expression - expect_true(identical(dim(getExpression(g_fil)), c(19332L, 1185L))) - # spatlocs - expect_equal(nrow(getSpatialLocations(g_fil)), 1185L) - # image - i_fil <- getGiottoImage(g_fil, image_type = "largeImage") - expect_s4_class(i_fil, "giottoLargeImage") - expect_identical(dim(i_fil), c(600, 600, 3)) - expect_equal(ext_to_rounded_num(i_fil), c(0, 1e4, -1e4, 0)) - - # polys - p_fil <- getPolygonInfo(g_fil, return_giottoPolygon = TRUE) - expect_s4_class(p_fil, "giottoPolygon") - expect_true(nrow(p_fil) == 1185) - expect_equal(ext_to_rounded_num(p_fil), c(3917, 7244, -7625, -2737)) + g_fil <- suppressWarnings(createGiottoVisiumObject( + h5_visium_path = manifest["filtered_feature_bc_matrix.h5"], + h5_gene_ids = "ensembl", + h5_json_scalefactors_path = file.path( + datadir, "spatial", "scalefactors_json.json" + ), + h5_image_png_path = file.path( + datadir, "spatial", "tissue_lowres_image.png" + ), + h5_tissue_positions_path = file.path( + datadir, "spatial", "tissue_positions_list.csv" + ) + )) + + expect_true(validObject(g_fil)) + # meta + expect_true(nrow(pDataDT(g_fil)) == 1185) + expect_true(nrow(fDataDT(g_fil)) == 19332) + expect_true("ENSMUSG00000051951" %in% fDataDT(g_fil)$feat_ID) + # expression + expect_true(identical(dim(getExpression(g_fil)), c(19332L, 1185L))) + # spatlocs + expect_equal(nrow(getSpatialLocations(g_fil)), 1185L) + # image + i_fil <- getGiottoImage(g_fil, image_type = "largeImage") + expect_s4_class(i_fil, "giottoLargeImage") + expect_identical(dim(i_fil), c(600, 600, 3)) + expect_equal(ext_to_rounded_num(i_fil), c(0, 1e4, -1e4, 0)) + + # polys + p_fil <- getPolygonInfo(g_fil, return_giottoPolygon = TRUE) + expect_s4_class(p_fil, "giottoPolygon") + expect_true(nrow(p_fil) == 1185) + expect_equal(ext_to_rounded_num(p_fil), c(3917, 7244, -7625, -2737)) }) - - - - - - - diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 000000000..097b24163 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/intro_to_giotto.Rmd b/vignettes/intro_to_giotto.Rmd new file mode 100644 index 000000000..a522bf1f8 --- /dev/null +++ b/vignettes/intro_to_giotto.Rmd @@ -0,0 +1,112 @@ +--- +title: "intro_to_giotto" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{intro_to_giotto} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +This is an example of a standard visium analysis. For more information and additional examples, visit out Giotto website at http://giottosuite.com + +Load package + +```{r setup} +library(Giotto) +``` + +Load a mini visium object + +```{r} +g <- GiottoData::loadGiottoMini("visium") +``` + +Filtering + +```{r} +g <- filterGiotto(g) +``` + +Normalization + +```{r} +g <- normalizeGiotto(g) +``` + +Add statistics + +```{r} +g <- addStatistics(g) +``` + +Find highly variable features + +```{r} +g <- calculateHVF(g) +``` + +Run PCA + +```{r} +g <- runPCA(g) +``` + +Plot PCA + +```{r} +plotPCA(g, + cell_color = "leiden_clus") +``` + +Run UMAP + +```{r} +g <- runUMAP(g) +``` + +Plot UMAP + +```{r} +plotUMAP(g, + cell_color = "leiden_clus") +``` + +Run tSNE + +```{r} +g <- runtSNE(g) +``` + +Plot tSNE + +```{r} +plotTSNE(g, + cell_color = "leiden_clus") +``` + +Do clustering + +```{r} +g <- doLeidenCluster(g) +``` + +Spatial plot with clusters + +```{r} +spatPlot2D(g, + cell_color = "leiden_clus") +``` + +Session info + +```{r} +sessionInfo() +``` +