Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Codable support for NSManagedObject subclasses #377

Closed
atomicbird opened this issue Sep 15, 2018 · 13 comments
Closed

Implement Codable support for NSManagedObject subclasses #377

atomicbird opened this issue Sep 15, 2018 · 13 comments

Comments

@atomicbird
Copy link
Collaborator

Feature Request

Core Data is often used to store data downloaded from a server API of some kind. Server APIs often provide data as one or more JSON dictionaries. Parsing these with JSONDecoder is convenient but creating managed object instances isn't possible without some extra work. This can and should be provided by mogenerator.

Rationale

Creating managed objects via Codable is complicated by the fact that Codable and NSManagedObject have conflicting ideas about initialization. Codable instantiates objects by calling init(from decoder: Decoder) (which will be auto-generated if no implementation exists). Meanwhile NSManagedObject's designated initializer is init(entity: NSEntityDescription, insertInto context: NSManagedObjectContext?).

The key to resolving this is that the Decoder protocol includes a userInfo dictionary typed as [CodingUserInfoKey : Any]. Implementing Codable is possible if you

  1. Include an NSManagedObject context in the userInfo dictionary
  2. Implement a custom init(from decoder: Decoder) rather than rely on synthesized code
  3. In this initializer, look up the context and use it to call NSManagedObject's designated initializer.

This amounts to a bunch of boilerplate code, customized to each entity, which is what mogenerator is good at.

What about find-or-create?

It's often necessary to implement a find-or-create approach to handle incoming server data. A common situation is that an app receives an array of dictionaries representing data objects. Some are new and should be created. Some are updated versions of existing data, so the existing object should be updated. This can be handled via a unique ID included in each instance. Core Data uniqueness constraints are capable of handling this, but may incur significant performance issues for large data sets.

Any implementation of Codable for NSManagedObject subclasses need to take this into account. Implementing find-or-create should not require a uniqueness constraint, but shouldn't prevent them for developers who prefer them.

To this end, any implementation of Codable should include the option to create non-inserted managed objects. These could be used in a custom find-or-create implementation. New objects could be inserted into the appropriate managed object context via the insert(_:) method. New versions of existing objects could be used to update the model, but since they were never inserted, they wouldn't lead to duplicate data. Whether to insert new objects could be indicated by another userInfo key.

Goal

Based on the above, code generated by mogenerator would first extend CodingKeys to handle the requirements above:

extension CodingUserInfoKey {
    /// Required. Must be an NSManagedObjectContext.
    static let context = CodingUserInfoKey(rawValue: "context")!
    /// Optional. Boolean. If present and true, newly created objects are not inserted into the context.
    static let deferInsertion = CodingUserInfoKey(rawValue: "deferInsertion")!
}

Developers would use the new mogenerator implementation of Codable as follows. Given a Data named incomingData...

If incomingData should contain a single instance of the Event entity:

        let decoder = JSONDecoder()
        decoder.userInfo[.context] = self.managedObjectContext
        let newEvent: Event? = try? decoder.decode(Event.self, from: incomingData)

If incomingData should contain an array of Event data, and custom find-or-create code will be used:

        let data = Data()
        let decoder = JSONDecoder()
        decoder.userInfo[.context] = self.managedObjectContext
        decoder.userInfo[.deferInsertion] = true
        let newEvents: [Event]? = try? decoder.decode([Event].self, from: incomingData)

The deferInsertion key indicates that new object should be instantiated but not inserted into the managed object context yet. Later, the find-or-create code can insert(_:) them if appropriate.

Sample Implementation

Below is an example of what mogenerator might produce for an entity. Some things to note:

  1. The swift42 branch currently uses a class for the "human" file and an extension for the "machine" file. Implementing Codable probably requires reverting to the previous approach of using two classes rather than one plus an extension. This is because init(from decoder: Decoder) can't be in an extension, but really should be in the "machine" file. I'd prefer to stick with class plus extension but not at the expense of putting this much generated code in the "human" file.

  2. All objects are initially not inserted, because decoding values can throw. If an error is thrown, not inserting means there's no need to clean up a half-initialized managed object. If no errors are thrown, the object is inserted.

  3. This code has a lot more comments than would be in generated code, by way of explaining what I'm trying to do with it.

@objc(_Event)
open class _Event: NSManagedObject, Codable {
    // This enum isn't named CodingKeys because it includes references to Core Data attributes. CodingKeys requires references to stored properties.
    // It should be possible to select whether each attribtue is included in this enum, and to specify an alternate name. It would be nice if not marking any attributes as included implied that all were included, so that in most cases it would only be necessary to make the entity Codable and be done with it.
    // The data model can include alternate names for keys to use when decoding/encoding.
    private enum EventCodingKeys: String, CodingKey, CaseIterable {
        case capacity  // Optional Int
        case eventType // Optional EventType (enum, raw value = Int)
        case eventTypeString // Optional EventTypeString (enum, raw value = String)
        case name = "eventName" // Required String
        case organizer // Optional OrganizerContactInfo (Codable struct)
        case timestamp // Optional Date
    }
    
    enum EventDecodeError: Error {
        case contextNotFound
        case entityNotFound
    }
    
    required convenience public init(from decoder: Decoder) throws {
        guard let context = decoder.userInfo[.context] as? NSManagedObjectContext else {
            throw EventDecodeError.contextNotFound
        }
        
        guard let entity = Event.entity(managedObjectContext: context) else {
            throw EventDecodeError.entityNotFound
        }
        
        // Initialize but don't insert into the context yet. Leave inerting until after decoding keys, in case we throw.
        self.init(entity: entity, insertInto: nil)
        
        let values = try decoder.container(keyedBy: EventCodingKeys.self)
        
        // The following should use `try?` for optional keys, `try` for required keys.
        for key in values.allKeys {
            switch key {
            case .capacity: capacity = try? values.decode(Int32.self, forKey: .capacity)
            case .eventType: // optional enum
                // We could set the primitive type directly but doing it this way ensures that the incoming value is valid for the enum.
                if let rawValue = try? values.decode(Int.self, forKey: .eventType),
                    let enumValue = EventType(rawValue: rawValue) {
                    eventType = enumValue
                }
            case .eventTypeString:
                // Optional enum with String raw values
                if let rawValue = try? values.decode(String.self, forKey: .eventTypeString),
                    let enumValue = EventTypeString(rawValue: rawValue) {
                    eventTypeString = enumValue
                }
            case .name:
                // Non-optional String, so throw if it's not present.
                name = try values.decode(String.self, forKey: .name)
            case .organizer:
                // This has type `OrganizerContactInfo`, which is `Codable`.
                organizer = try? values.decode(OrganizerContactInfo.self, forKey: .organizer)
            case .timestamp:
                timestamp = try? values.decode(Date.self, forKey: .timestamp)
            }
        }
        
        // If we made it this far without throwing, insert self into the context.
        // Default is to insert. If deferInsertion is present and `true`, do not insert.
        let deferInsertion = decoder.userInfo[CodingUserInfoKey.deferInsertion] as? Bool ?? false
        if !deferInsertion {
            context.insert(self)
        }
    }
    
    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: EventCodingKeys.self)
        // Use `encodeIfPresent` for optional attributes, `encode` for required.
        for keyName in EventCodingKeys.allCases {
            switch keyName {
            case .capacity:
                try container.encodeIfPresent(capacity, forKey: .capacity)
            case .eventType:
                try container.encodeIfPresent(eventType?.rawValue, forKey: .eventType)
            case .eventTypeString:
                try container.encodeIfPresent(eventTypeString?.rawValue, forKey: .eventTypeString)
            case .name:
                try container.encode(name, forKey: .name)
            case .organizer:
                try container.encodeIfPresent(organizer, forKey: .organizer)
            case .timestamp:
                try container.encodeIfPresent(timestamp, forKey: .timestamp)
            }
        }
    }

    // ....followed by the rest of the generated code...
}

Please Comment

This is not implemented yet-- I've been assembling the sample code above before modifying mogenerator to produce it, because it's a big addition and I want to make sure I know what target I'm aiming for. If you use or might use Codable with Core Data, please let me know what you think. Thanks!

@atomicbird atomicbird changed the title Implement Codable support for NSManagedObject subclasses Implement Codable support for NSManagedObject subclasses Sep 16, 2018
@radianttap
Copy link
Contributor

I'm really conflicted about this.

On one side, this is good implementation of Codable. I don't like the two classes approach but I agree it's necessary in this case, since the generated code will be larger than human-generated in most use cases. I understand the desire and benefits of automating mundane conversion tasks like this.

On the other side, I find it untenable in practice.
I had the same (say) Event object using different JSON representations, depending on who wrote the backend code at the time, over a period of few years. So you'll need more than one EventCodingKeys enums to cover such issues. Or you have the same JSON key that maps to particular Event attribute but in one JSON response it's numeric and in the other is boolean on even a String. Enforcing the fix for this on the backend is mission impossible when you're brought in as contractor, as you can imagine.

Maybe I'm just unlucky but in last 5 projects I had at least one such case (multiple JSON structures for the same model object) in each of them. I gave up on automating that conversion and just write the inits by hand.

Re: automatic find-or-create – I tried it about two years ago on a pretty large client project and found that it can work only in perfectly designed ecosystem. Where service developers are aware that their JSON streams will be consumed in this way and extra care is taken when crafting JSON responses. In practice, especially contracting practice, this is never the case :(
I always had to fallback to write custom import processing, for each API call. Automating insert/update never worked out of the box and delete is completely impossible to automate. This might just push me to finish the blog draft I have open for a long time, on how I do incoming data parsing and Core Data import.

@radianttap
Copy link
Contributor

radianttap commented Sep 19, 2018

Few more thoughts in general, about Codable.

  • Not every system is JSON, I still see a good deal of XML stuff and had one binary based input stream.
  • Thus not sure about going all-in on JSONEncoder and JSONDecoder
  • Maybe shot for the stars and create CoreDataEncoder and CoreDataDecoder, which would possibly solve the "binary" issue in RFC: Swift Codable Attribute Support #375

Mattt has a good starting point for custom codable encoders.

@JohnEstropia
Copy link

This looks promising, although I do think there are some challenges here:

  • Inserting without an MOC means we cannot handle nested JSON to attach relationships, at least in one pass (attempting to set relationships without an MOC will cause an assertion failure and crash).
  • Regarding "find-or-create", I agree with @radianttap. It sounds convenient but in my experience the conditions for the "find" part are usually app-dependent. For example, there will be many times an entity would have compound keys, or at least a key that is derived from another. At that point I think it's moot to manage that from Codable and it's better to leave uniquing to the app's import logic. (Personally I think it would be more efficient anyway if unique constraints are used instead)
  • Performance.

Some ideas I'll throw in:

  • Since we're already generating code, is there any reason to iterate on values.allKeys and EventCodingKeys.allCases, then switch-case on them? Wouldn't it have the same effect to just grind through each decode() and encode() directly?
  • If the init(from:) implementation would depend heavily on metadata such as .context and . deferInsertion, I think it's better to provide a generic extension for Decoder (and Encoder) that would wrap the needed metadata. It would significantly reduce the required unit tests.
  • Most JSON values from most web API are not flat but multi-nested. I think there needs a mechanism to generate container.nestedContainer(...) in compound keys like "profile.name" (which I don't think CodingKeys support)
  • Not sure how far in the import you are planning to consider, but I think utilizing child contexts is a considerable alternative to the null-context approach.
  • To reduce the code needed to be generated, you can let the Decoder.userInfo hold something like [CodingKey: NSPropertyDescription] so the key-values can be iterated on instead. I'm not sure which would be more performant though.

All in all, I think the approach above looks sound. I just worry of the workflows that people would end up after this one decoding process. For instance,

let decoder = JSONDecoder()
decoder.userInfo = [
    .context: context,
    .deferInsertion: true
]
let newEvents = try decoder.decode([Event].self, from: jsonArrayData)
// How to process relationships?

let newPerson = try decoder.decode(Person.self, from: jsonData)
// We reused the decoder here, was that ok?

@atomicbird
Copy link
Collaborator Author

@radianttap Thanks for the feedback.

I sympathize with the problem of inconsistent or even incorrect JSON. I've encountered it myself. I don't think there's a single solution that applies everywhere though, because every broken server API is broken in a different way. I don't think mogenerator can aim to solve all of those problems, so I'm hoping for a solid implementation that works in many cases but doesn't claim to be the best answer for everyone. If nothing else, mogenerator's code might be a good starting point for a more custom Codable implementation.

As for the focus on JSON, that's not really central to the code. The sample Codable methods don't assume that they're decoding JSON, they're working with Codable using whatever data it provides. The same code would work with PropertyListDecoder or with custom Codable implementations.

I'd be interested to hear more of your thoughts on CoreDataEncoder and CoreDataDecoder. The code in this issue is intended for consuming and producing server data, which would generally be JSON or XML, so I'm not sure what format the Core Data-specific classes would use.

@atomicbird
Copy link
Collaborator Author

@JohnEstropia Thanks for the feedback. Going over various points:

  • Using a child context might be a better choice than a nil context, I'll have to experiment a little. For what its worth though, using a nil context shouldn't interfere with relationships as long as the related objects are saved at the same time.
  • As I mentioned to @radianttap, I don't expect this implementation to solve every need for Codable, because as both of you note the requirements are so diverse as to make that impossible. The goal here is a solid implementation while understanding that it may not be useful everywhere. It would be possible to extend this for common additional needs in a later update.
  • You're completely right about iterating over all the keys. I included it only because it's what I'm doing in my own code now, to avoid accidentally missing a key. But my current code is hand written instead of generated, and generated code doesn't need to do that.
  • I'm not sure how to go about implementing the generic Decoder and Encoder extensions you mention, but they sound like a good idea. I'll see what I can come up with along those lines. In the meantime if you have a sample implementation in mind please let me know.
  • It's OK to reuse a decoder. The state during decoding depends on the data you're decoding, so using the same decoder for different data coming from different Data blobs is fine.

@atomicbird
Copy link
Collaborator Author

@JohnEstropia I was looking into your suggestion of extending Decodable and Encodable today. I like the idea of wrapping them up, not least to provide clear type info instead of the Any values that userInfo provides. I'm not sure it'll work though, but I might not be understanding what you have in mind.

I tried this approach for the managed object context:

extension Decoder {
    var context: NSManagedObjectContext? {
        set {
            userInfo[.context] = newValue
        }
        get {
            return userInfo[.context] as? NSManagedObjectContext
        }
    }
}

In theory it's great. In practice, userInfo is a get-only property, so the set code doesn't compile. The get code is fine, but it would only be used in generated code, so that's not much of an advantage.

I thought maybe I could use a custom implementation of Decoder to allow this but I don't think there's any way to make JSONDecoder create instances of that implementation.

If you have something else in mind or can suggest a way to make this work, please let me know. I'd like to do this but right now I don't think it will work.

@JohnEstropia
Copy link

JohnEstropia commented Oct 7, 2018

@atomicbird Ah, I didn't know userInfo was a readonly property for Decoder. What about utility methods for known decoders like JSONDecoder instead? My idea was to simply abstract how the parameters are being passed to the decoder. Something like

extension JSONDecoder {
    func decode<T: NSManagedObject>(_ type: T.Type, data: Data, in context: NSManagedObjectContext, deferInsertion: Bool = false) throws -> T {
        decoder.userInfo[.context] = context
        decoder.userInfo[.deferInsertion] = deferInsertion
        defer {
            decoder.userInfo[.context] = nil
            decoder.userInfo[.deferInsertion] = nil
        }
        return try self.decode(type, from: data)
    }
}

or even rotate the types so that this can be called from the NSManagedObjectContext instead:

extension NSManagedObjectContext {
    func decode<T: NSManagedObject>(_ type: T.Type, data: Data, with decoder: JSONDecoder, deferInsertion: Bool = false) throws -> T {
        decoder.userInfo[.context] = self
        decoder.userInfo[.deferInsertion] = deferInsertion
        defer {
            decoder.userInfo[.context] = nil
            decoder.userInfo[.deferInsertion] = nil
        }
        return try decoder.decode(type, from: data)
    }
}

This is if each unit of decoding is preferred over sharing a JSONDecoder over multiple entities. Otherwise we can provide a convenience initializer instead:

extension JSONDecoder {
    convenience init(context: NSManagedObjectContext) {
        self.init()
        self.userInfo[.context] = context
    }

    func decode<T: NSManagedObject>(_ type: T.Type, data: Data, deferInsertion: Bool = false) throws -> T {
        // ...
    }
}

@atomicbird
Copy link
Collaborator Author

atomicbird commented Oct 10, 2018

@JohnEstropia These look like really good ideas. I'm not sure about depending on JSONDecoder specifically though. It's probably the most commonly used decoder but it's not the only one (and of course people can build their own, and it'd be nice to have something more general purpose). What do you think of something like this? I'm adding a new protocol that covers Apple's decoders and which can be trivially extended to cover any new decoders people might write.

protocol MODecoder: class {
    var userInfo: [CodingUserInfoKey : Any] { get  set }
    func decode<T>(_ type: T.Type, from data: Data) throws -> T where T : Decodable
    func decode<T:NSManagedObject & Decodable>(_ type: T.Type, data: Data, in context: NSManagedObjectContext, deferInsertion: Bool) throws -> T
}

extension JSONDecoder: MODecoder { }
extension PropertyListDecoder: MODecoder {}

extension MODecoder {
    func decode<T:NSManagedObject & Decodable>(_ type: T.Type, data: Data, in context: NSManagedObjectContext, deferInsertion: Bool = false) throws -> T {
        
        userInfo[.context] = context
        userInfo[.deferInsertion] = deferInsertion
        defer {
            userInfo[.context] = nil
            userInfo[.deferInsertion] = nil
        }
        return try self.decode(type, from: data)
    }
}

Then you could create whatever decoder you like and use this method to handle decoding managed objects.

@JohnEstropia
Copy link

Good idea, that's definitely more elegant 👍

@atomicbird
Copy link
Collaborator Author

I think we more or less have agreement here. As I understand @radianttap's concerns, they'll be addressed separately by support for Swift enums (which would not require Codable support). Now to implement the template and core mogenerator code to get it to work. :)

@atomicbird
Copy link
Collaborator Author

I just wanted to let people know that, despite a long delay, I'm working on finishing this up, and making good progress. There have been a number of sticking points I didn't anticipate but I'm working through them one by one.

The most recent one is that under the current code, a numeric attribute which does not have "use scalar attribute" checked in the data model is synthesized as an NSNumber property in code. That makes good conceptual sense and I don't think it should change.

However, NSNumber does not adopt Codable, which means that these properties need a little extra work to encode. No worries, it's working, and with any luck I'll have a commit you can check out soon.

In the meantime, anyone interested in how mogenerator works might be interested in the new mogenerator internals document I added to share things I've learned in coming to grips with the code.

@atomicbird
Copy link
Collaborator Author

I've run into some conceptual difficulties that I'm not sure how to fix, as a result of which I think this feature may be dropped for now, so as to not keep holding up other Swift changes.

The current sticking point is what to do about relationships to entities that do not have a custom class. I had been thinking that the code could assume relationships would be to other custom classes, which would also get Codable support as outlined here, and that would be fine. If there's no custom subclass then that doesn't work. There may be a way to deal with that but I don't know what it is right now and I think I've been hung up on it for long enough that it's time to stop.

Score one for the built-in tests; I was blinded by the fact that I always have custom subclasses, and it didn't occur to me to handle that when I suggested this.

@rentzsch
Copy link
Owner

Score one for the built-in tests; I was blinded by the fact that I always have custom subclasses, and it didn't occur to me to handle that when I suggested this.

Awesome, glad it caught this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants